Then the next step is to train the model using clips/dev.csv, clips/test.csv and clips/train.csv.
Why don’t we use clips/train-all.csv as training data? This file have a lot more data than clips/train.csv and also from validated dataset so I think it should output a better model. But in the doc I do not see any mention about this file.
Also, was DeepSpeech pre-trained model trained from clips/train.csv or clips/train-all.csv?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
No, if you train with validation dataset, you just overfit and learn nothing.
Hi I do not train with validation dataset.
What I mean by “validated dataset” is this file en/validated.tsv which is already validated its quality by up votes and down votes. It is different and not for validation while training.
Anyway I just want to know If I should use en/clips/train-all.csv instead of en/clips/train.csv for training. I am sure that they do not include dev and test dataset