How to trained a model for common voice dataset using deepspeech v0.6.1?

ok. Thanks, now Im getting somewhat. I have one more query regarding overfitting on common voice dataset using below command,

nohup ./DeepSpeech.py --n_hidden 2048 --learning_rate 0.0001 --checkpoint_dir /home/user/deepspeech-0.6.1-checkpoint/ --train_files /home/user/en/clips/train.csv --dev_files /home/user/en/clips/dev.csv --test_files /home/user/en/clips/test.csv --export_dir /home/user/modelexport  --use_cudnn_rnn --train_batch_size 100 --test_batch_size 100 --dev_batch_size 100 &

And the log is,
epoch starting time 2020-02-03 15:28:04.162653
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 |   Training | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 8.183294
Epoch 0 |   Training | Elapsed Time: 0:00:05 | Steps: 2 | Loss: 8.716593
Epoch 0 |   Training | Elapsed Time: 0:00:08 | Steps: 3 | Loss: 8.281446
Epoch 0 |   Training | Elapsed Time: 0:00:10 | Steps: 4 | Loss: 8.427862
Epoch 0 |   Training | Elapsed Time: 0:00:12 | Steps: 5 | Loss: 8.701849
Epoch 0 |   Training | Elapsed Time: 0:00:15 | Steps: 6 | Loss: 8.790705
Epoch 0 |   Training | Elapsed Time: 0:00:17 | Steps: 7 | Loss: 8.693750
.
.
.
Epoch 0 |   Training | Elapsed Time: 0:32:18 | Steps: 308 | Loss: 19.248939
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /home/user/en/clips/dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 24.301804 | Dataset: /home/user/en/clips/dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:06 | Steps: 2 | Loss: 27.545616 | Dataset: /user/en/clips/dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:08 | Steps: 3 | Loss: 26.931159 | Dataset: /home/en/clips/dev.csv
I Saved new best validating model with loss 38.864146 to: /home/user/deepspeech-0.6.1-checkpoint/best_dev-234092
epoch ending time 2020-02-03 16:05:47.882186
Total epoch time 0:37:43.719533
epoch starting time 2020-02-03 16:05:47.882206
Epoch 1 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 1 |   Training | Elapsed Time: 0:00:04 | Steps: 1 | Loss: 9.052685
.
.
.
Epoch 1 | Validation | Elapsed Time: 0:05:20 | Steps: 63 | Loss: 39.579556 | Dataset: /home/user/en/clips/dev.csv
Epoch 1 | Validation | Elapsed Time: 0:05:20 | Steps: 63 | Loss: 39.579556 | Dataset: /home/user/en/clips/dev.csv

Like this till epoch 33 Im getting training loss always decreasing but the validation loss is always increasing.

Thatā€™s textbook overfitting. You need to do your own homework and adjust training hyper-parameters.

Joining this nice challenge of training something on a CommonVoice.
Have some tips for now:

  1. Donā€™t run an example code like DeepSpeech.py --train_files ./train.csv --dev_files ./dev.csv --test_files ./test.csv --automatic_mixed_precision=True without parameters. At least batch_size. It will increase loss and will be super slow. It takes only one short example and NN canā€™t learn anything good.
  2. After you get right params youā€™ll likely see validation drops with Epochs and itā€™s good. The system checkpoints the best validation model and will use it for test. To get better results play with learning rate & dropout.
  3. You may need about 2 days to train on 2X1080ti with batch_size=16
  4. Test shows best, middle and worst WER. Best WER should be 0.
  5. Something around 48% is what is reported
  6. I believe itā€™s due to bad examples and language model from Wikipedia. So retrain lm.

Now Iā€™m on this step

1 Like

Thanks @fleandr :slight_smile: Your suggestions are helpful. I will definitely going to try them.