Step, epoch, hardware, weird Duration

Hello,
I am using DeepSpeech version 0.7.4 , I don’t wan’t use transfer learning, I uses deep speech english dataset, and these are my hyperparameters :

python3 DeepSpeech.py --train_files data/CV/en/clips/dev.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --checkpoint_dir data/tmpTestFolder --export_dir data/tmpTestExport --n_hidden 2048 --epochs 100 --dropout_rate 0.40 --lm_alpha 0.75 --lm_beta 1.85 --learning_rate 0.00001 --automatic_mixed_precision --train_cudnn True

I have 32G ram, 512G SSD, GTX 1080 TI GPU,

I have 3 questions now,
1 - ) each epoch takes too long and too many steps( near million steps ) but I’ve seen in your documentation you never reach this step count ( is something wrong with my batchSize? is default batchsize is 1? )

2 - ) when you wrote 120 epoch for the first phase in 0.7.4 version. did you really cover all the training data? I mean you didn’t use any iteration count for training data to just use some of them?

3-) each epoch takes 12h. is it okay?

@ lissyx

How much data do you have ? You mention “DeepSpeech English dataset” but your command line links to some Common Voice english, and no mention of the release.
I guess 12h on your GPU might be expected.

Yes, please read the doc and the help of --helpfull, it’s all documented.

Just to add to lissyx

A higher batch size will speed up your training significantly and you should set it for train/dev as high as possible without producing out of memory erros. With your GPU you should get to 4/8?

Training is really resource intensive, so yes this looks realistic.

Absolutely, if you are using batch 1 and around 500-700 hours of input. This could be an OK number.

How much data do you have? You mention the “DeepSpeech English dataset” but your command line links to some Common Voice English, and no mention of the release.
I guess 12h on your GPU might be expected.

yes, you are right that is Common Voice English.

Yes, please read the doc and the help of --helpfull , it’s all documented.

if I increase batchsize does it reduce training time?




As you mentioned in the release doc in Github. for 0.7.4 you had around 300 epoch. (That’s somehow impossible for my hardware)

When there is 500h of data ( my own data) how many epoch do I need for acceptable accuracy and WER?

thanks, And another question as I asked lissyx. how many epochs do I need for 500h data to get acceptable WER and accuracy ?

lissyx would argue that acceptable is relative :slight_smile: I would say with 15 epochs you should see OK results. Check the losses.

And yes, higher batch size = much lower training time.

thanks othiele and lissyx

I’ll update this Topic with my results. :+1:

Does automatic_mixed_precision do anything on GeForce 1080s? I thought it only worked on the latest generation.

1 Like

Needs turing or volta, pascal is not using tensor cores.

1 Like