I have tried training CommonVoice English, as you adviced me, @lissyx, and I have used the parameters like in .compute, except I had a greater batch size of 65 to speed the experiment up. It ended up like with my original dataset: early overfit, great test WER of 0.588034. Here is the loss evolution:
train dev
107.415366 90.988997
77.633969 79.132081
66.763453 73.079587
59.810548 68.768595
54.691266 65.851911
50.673932 64.093251
47.387396 62.118382
44.590231 62.041358
42.132208 61.115278
39.956110 60.476461
37.953228 60.269572
36.213714 59.242173
34.556712 59.173234 *
33.081850 59.734880
31.704943 59.559893
29.725658 60.446697
28.603038 59.974832
27.602044 60.964456
26.651280 61.209080
25.728388 62.034175
24.916874 61.714609
24.130870 62.894777
23.336425 62.838955
22.688621 63.969326
22.020799 64.639862
21.306159 64.927457
20.746867 65.340298
20.205466 65.611585
19.624681 67.308286
19.077871 67.558523
18.554258 69.291947
18.068047 69.521966
17.536386 71.407673
I used these parameters:
DeepSpeech.py \
--alphabet_config_path "data/alphabet.txt" \
--checkpoint_dir "checkpoints" \
--dev_batch_size 65 \
--dev_files "dev.csv" \
--export_dir "model" \
--lm_binary_path "data/lm/lm.binary" \
--lm_trie_path "data/lm/trie" \
--summary_dir "summaries" \
--test_batch_size 65 \
--test_files "test.csv" \
--train_batch_size 65 \
--learning_rate 0.0001 \
--dropout_rate 0.2 \
--n_hidden 2048 \
--use_cudnn_rnn \
--noearly_stop \
--train_files "train.csv"
There must be something wrong and I still have no clue. I believe I followed the instructions quite precisely. It happens also on different hardware. It worked maybe a year ago, I had a master checkout, I don’t know exactly which commit. Then I updated to v0.6.1 and since then this happens. It may not have anything to do with the update, I don’t know. I tried re-cloning to no avail.
Please, how can I go about identifying the cause of this behavior? Or what solution would you suggest trying next? Thank you sincerely.