I am following this doc to train my own English model using CommonVoice data
https://deepspeech.readthedocs.io/en/r0.9/TRAINING.html
After running this command:
bin/import_cv2.py --filter_alphabet path/to/some/alphabet.txt /path/to/extracted/language/archive
there are files generated
clips/dev.csv
clips/test.csv
clips/train.csv
clips/train-all.csv
Then the next step is to train the model using clips/dev.csv
, clips/test.csv
and clips/train.csv
.
Why don’t we use clips/train-all.csv
as training data? This file have a lot more data than clips/train.csv
and also from validated dataset so I think it should output a better model. But in the doc I do not see any mention about this file.
Also, was DeepSpeech pre-trained model trained from clips/train.csv
or clips/train-all.csv
?