How to trained a model for common voice dataset using deepspeech v0.6.1?

ok. Thanks, now Im getting somewhat. I have one more query regarding overfitting on common voice dataset using below command,

nohup ./DeepSpeech.py --n_hidden 2048 --learning_rate 0.0001 --checkpoint_dir /home/user/deepspeech-0.6.1-checkpoint/ --train_files /home/user/en/clips/train.csv --dev_files /home/user/en/clips/dev.csv --test_files /home/user/en/clips/test.csv --export_dir /home/user/modelexport  --use_cudnn_rnn --train_batch_size 100 --test_batch_size 100 --dev_batch_size 100 &

And the log is,
epoch starting time 2020-02-03 15:28:04.162653
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 |   Training | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 8.183294
Epoch 0 |   Training | Elapsed Time: 0:00:05 | Steps: 2 | Loss: 8.716593
Epoch 0 |   Training | Elapsed Time: 0:00:08 | Steps: 3 | Loss: 8.281446
Epoch 0 |   Training | Elapsed Time: 0:00:10 | Steps: 4 | Loss: 8.427862
Epoch 0 |   Training | Elapsed Time: 0:00:12 | Steps: 5 | Loss: 8.701849
Epoch 0 |   Training | Elapsed Time: 0:00:15 | Steps: 6 | Loss: 8.790705
Epoch 0 |   Training | Elapsed Time: 0:00:17 | Steps: 7 | Loss: 8.693750
.
.
.
Epoch 0 |   Training | Elapsed Time: 0:32:18 | Steps: 308 | Loss: 19.248939
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /home/user/en/clips/dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:03 | Steps: 1 | Loss: 24.301804 | Dataset: /home/user/en/clips/dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:06 | Steps: 2 | Loss: 27.545616 | Dataset: /user/en/clips/dev.csv
Epoch 0 | Validation | Elapsed Time: 0:00:08 | Steps: 3 | Loss: 26.931159 | Dataset: /home/en/clips/dev.csv
I Saved new best validating model with loss 38.864146 to: /home/user/deepspeech-0.6.1-checkpoint/best_dev-234092
epoch ending time 2020-02-03 16:05:47.882186
Total epoch time 0:37:43.719533
epoch starting time 2020-02-03 16:05:47.882206
Epoch 1 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 1 |   Training | Elapsed Time: 0:00:04 | Steps: 1 | Loss: 9.052685
.
.
.
Epoch 1 | Validation | Elapsed Time: 0:05:20 | Steps: 63 | Loss: 39.579556 | Dataset: /home/user/en/clips/dev.csv
Epoch 1 | Validation | Elapsed Time: 0:05:20 | Steps: 63 | Loss: 39.579556 | Dataset: /home/user/en/clips/dev.csv

Like this till epoch 33 Im getting training loss always decreasing but the validation loss is always increasing.

That’s textbook overfitting. You need to do your own homework and adjust training hyper-parameters.

Joining this nice challenge of training something on a CommonVoice.
Have some tips for now:

  1. Don’t run an example code like DeepSpeech.py --train_files ./train.csv --dev_files ./dev.csv --test_files ./test.csv --automatic_mixed_precision=True without parameters. At least batch_size. It will increase loss and will be super slow. It takes only one short example and NN can’t learn anything good.
  2. After you get right params you’ll likely see validation drops with Epochs and it’s good. The system checkpoints the best validation model and will use it for test. To get better results play with learning rate & dropout.
  3. You may need about 2 days to train on 2X1080ti with batch_size=16
  4. Test shows best, middle and worst WER. Best WER should be 0.
  5. Something around 48% is what is reported
  6. I believe it’s due to bad examples and language model from Wikipedia. So retrain lm.

Now I’m on this step

1 Like

Thanks @fleandr :slight_smile: Your suggestions are helpful. I will definitely going to try them.