A little update: I’ve chosen https://www.exoscale.com , a swiss provider where you can get all sorts of GPUs including V100s (after contacting the support if you don’t want to buy a complete month).
I did a first experiment with the old dataset a while ago, one epoch took 3 hours with a 1080 Ti (32Gb). I believe this is a little long, since the train.tsv is a lot smaller than the 35h of the dataset and the new release of Common Voice more than doubled the available data to 83 hours. Would this be quicker on a V100 or P100?
My parameters so far are:
python3 DeepSpeech.py --train_files …/eo/clips/train.csv --dev_files …/eo/clips/dev.csv --test_files …/eo/clips/test.csv --automatic_mixed_precision --train_batch_size 16 --epochs 7
What can I do to optimize this? How useful is —use_cudnn_rnn ?