@reuben should I always extend the validation set with my own data instead of just using my validation data set? because otherwise it would over fit, wouldn’t it?
Your validation set should always be representative of the types of audio you want your model to be good at. Otherwise, yes, you risk overfitting.
fewer epochs, smaller learning rate, make sure your dev set is good
@rajpuneet.sandhu Can you post a complete guide for training DeepSpeech 0.4.1 ?
Plus guide about which file are required for it ?
How trie can be generated ?
checkout ‘How I trained a French robot’. It has all the steps @noor_e_emaan11
I trained with TEDlium (from Mozilla common voice website)and Voxforge (using the import script in DeepSpeech repo) datasets with the following:
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir ~/deepspeech-0.4.1-checkpoint --epoch -1 --train_files /home/rsandhu/ted-train.csv,/home/rsandhu/voxforge-train.csv --dev_files /home/rsandhu/ted-dev.csv,/home/rsandhu/voxforge-dev.csv,/home/rsandhu/common_voice_training_data/cv-valid-dev.csv,/mnt/librivox_data/librivox-dev-clean.csv,/mnt/librivox_data/librivox-dev-other.csv --test_files /home/rsandhu/ted-test.csv,/home/rsandhu/voxforge-test.csv --learning_rate 0.0001 --train_batch_size 24 --dev_batch_size 48 --test_batch_size 48 --display_step 0 --validation_step 1 --dropout_rate 0.2 --checkpoint_step 1 --lm_alpha 0.75 --lm_beta 1.85 --export_dir ~/new_model
I tested this generated model and the release 0.4.1 model and the results are in the attached filedeepspeech_test_comparison_ted_voxforge.zip (8.2 KB)
This is slightly out of context, I’m trying to train a model on the same data set. Could you share your findings and developments?