Overfitting on Common Voice

The command I am running:

./DeepSpeech.py --train_files …
–dev_files …/dev.csv --test_files …/test.csv
–checkpoint_dir … --export_dir … --epochs 20 --train_batch_size 64 --dev_batch_size 64 --test_batch_size 64 --early_stop False --summary_dir … --es_steps 50 --learning_rate 0.000075 --dropout_rate 0.20

The model is overfitting, validation error falls max to 70 and the training error continues to decrease. I am thinking of changing the hidden layers. from 2048 to 1800. I have played with the learning rate/dropout and it does not make a difference.

I am training on a p2.xlarge on the Common Voice Dataset.

That does not tell us which language ?

It is the Eng dataset.

That’s not going to be enough data to get you anything really useful. I’d say stop the training at the epoch where you start drifting between training and validation loss ? I don’t really understand what you are asking here.

I am trying to build a Speech to Text model using DeepSpeech. I tried training on the Common Voice dataset from scratch but the accuracy is not good. Are you saying I need more data? Should I train further on the pre-trained model?

Common Voice is the only dataset explored yet.

That’s not really something actionnable.


v0.6 model is really getting close now, and it should include currently-released Common Voice English. So maybe you would like to wait … ?

Can you point to existing datasets?
And when is the 0.6 version releasing?

Have you had a look at the Common Voice dataset page ? https://voice.mozilla.org/datasets

I don’t know. Soon, for sure.

1 Like