Training results with 0.4.1 far worse than 0.3.0

@kdavis the audio file I used for testing has an American female speaker speaking general English language. I tested it manually, this is not the result of test epoch that I am talking about.

@reuben @lissyx any thoughts on this?

I haven’t seen enough evidence to convince me that 0.4.1 is performing far worse than 0.3.0 as you suggest, so I don’t know how I can help. Like @kdavis already said, start with the basics, don’t fine tune the 0.4.1 model.

I think there’s some confusion here. Let’s forget that Indian places are not getting recognized. So, changing the language model is out of the question here. I test an audio recording with American accent that has no region specific names just general English language and the result is ‘x’. I did incremental training. I test using the same audio file and the result is ‘y’. What I observe is that the result ‘y’ is worse than ‘x’. This shouldn’t have happened. Was I able to clear the situation here @reuben @kdavis @lissyx?

How is the data you train on validated? If you train on bad data, you will get bad results.

@kdavis one of the datasets I can’t vouch for. But, I tried training on a dataset that me and 8 other people have generated by recording our own voice samples which is a 100% accurate. I observe the same thing over there.

Something else I wanted to add here was, I picked up a few audio files that were used to train the model and tested using those file. For those the accuracy is 100%.

Sounds like you might be overfitting?

Exactly. But, early stopping is triggered and then I retrain only for the number of epochs till which the loss is reduced and after which it starts increasing.

Any idea how I could avoid this? If this is not the correct approach

@josh_meyer could you give any suggestions on how I could prevent over fitting since you are the expert on transfer learning? Fine tuning is resulting in overfitting and making the model perform worse than before.

Train for fewer epochs, and try extending the validation set with your data rather than replacing it.

1 Like

@reuben here is the process I follow:

  1. I start training for 10 epochs with DeepSpeech 0.4.1 checkpoint
  2. I observe that the validation loss is reduced till 6th epoch, starts increasing after that and early stopping is triggered in 8th epoch.
  3. I pickup the DeepSpeech 0.4.1 checkpoint again and this time train only till 6th epoch.

Are you saying that I should stop training before the 6th epoch even though the validation loss is decreasing?

My validation set is still the same. The last test result that I have posted is the inference result by running DeepSpeech with the release model and the model that I have exported

Right, and I’m suggesting this could be (part of) the problem. Instead of just using your data for the validation set, mix it with the data we used when training v0.4.1 like Librispeech.

oh! I see. Thanks. I’m gonna try doing that. But, I had just one more question. Right now I have split my train, dev and test into 70%-20%-10% resp just like @elpimous_robot’s tutorial. If I include your validation data then it would no longer be in that proportion. Would that be a problem?

@reuben I included the librispeech and common voice validation data sets. I used the following command:

python3 DeepSpeech.py --n_hidden 2048 --export_dir ~/new_model --checkpoint_dir ~/deepspeech-0.4.1-checkpoint --epoch -20 --train_files /home/rsandhu/iitm_data/assamese-male-train.csv,/home/rsandhu/iitm_data/assamese-female-train.csv,/home/rsandhu/iitm_data/bengali-male-train.csv,/home/rsandhu/iitm_data/gujarati-male-train.csv,/home/rsandhu/iitm_data/gujarati-female-train.csv,/home/rsandhu/iitm_data/hindi-male-train.csv,/home/rsandhu/iitm_data/hindi-female-train.csv,/home/rsandhu/iitm_data/kannada-male-train.csv,/home/rsandhu/iitm_data/kannada-female-train.csv,/home/rsandhu/iitm_data/malayalam-male-train.csv,/home/rsandhu/iitm_data/malayalam-female-train.csv,/home/rsandhu/iitm_data/manipuri-male-train.csv,/home/rsandhu/iitm_data/manipuri-female-train.csv,/home/rsandhu/iitm_data/rajasthani-male-train.csv,/home/rsandhu/iitm_data/rajasthani-female-train.csv,/home/rsandhu/iitm_data/tamil-male-train.csv,/home/rsandhu/iitm_data/tamil-female-train.csv --dev_files /home/rsandhu/iitm_data/assamese-male-dev.csv,/home/rsandhu/iitm_data/assamese-female-dev.csv,/home/rsandhu/iitm_data/bengali-male-dev.csv,/home/rsandhu/iitm_data/gujarati-male-dev.csv,/home/rsandhu/iitm_data/gujarati-female-dev.csv,/home/rsandhu/iitm_data/hindi-male-dev.csv,/home/rsandhu/iitm_data/hindi-female-dev.csv,/home/rsandhu/iitm_data/kannada-male-dev.csv,/home/rsandhu/iitm_data/kannada-female-dev.csv,/home/rsandhu/iitm_data/malayalam-male-dev.csv,/home/rsandhu/iitm_data/malayalam-female-dev.csv,/home/rsandhu/iitm_data/manipuri-male-dev.csv,/home/rsandhu/iitm_data/manipuri-female-dev.csv,/home/rsandhu/iitm_data/rajasthani-male-dev.csv,/home/rsandhu/iitm_data/rajasthani-female-dev.csv,/home/rsandhu/iitm_data/tamil-male-dev.csv,/home/rsandhu/iitm_data/tamil-female-dev.csv,/home/rsandhu/common_voice_training_data/cv-valid-dev.csv,/mnt/librivox_data/librivox-dev-clean.csv,/mnt/librivox_data/librivox-dev-other.csv --test_files /home/rsandhu/iitm_data/assamese-male-test.csv,/home/rsandhu/iitm_data/assamese-female-test.csv,/home/rsandhu/iitm_data/bengali-male-test.csv,/home/rsandhu/iitm_data/gujarati-male-test.csv,/home/rsandhu/iitm_data/gujarati-female-test.csv,/home/rsandhu/iitm_data/hindi-male-test.csv,/home/rsandhu/iitm_data/hindi-female-test.csv,/home/rsandhu/iitm_data/kannada-male-test.csv,/home/rsandhu/iitm_data/kannada-female-test.csv,/home/rsandhu/iitm_data/malayalam-male-test.csv,/home/rsandhu/iitm_data/malayalam-female-test.csv,/home/rsandhu/iitm_data/manipuri-male-test.csv,/home/rsandhu/iitm_data/manipuri-female-test.csv,/home/rsandhu/iitm_data/rajasthani-male-test.csv,/home/rsandhu/iitm_data/rajasthani-female-test.csv,/home/rsandhu/iitm_data/tamil-female-test.csv,/home/rsandhu/iitm_data/tamil-male-test.csv --learning_rate 0.000095 --train_batch_size 12 --dev_batch_size 24 --test_batch_size 24 --display_step 0 --validation_step 1 --dropout_rate 0.2 --checkpoint_step 1 --lm_alpha 0.75 --lm_beta 1.85

I Training of Epoch 36 - loss: 17.247186
I Validation of Epoch 36 - loss: 39.121008
I Training of Epoch 37 - loss: 12.315964
I Validation of Epoch 37 - loss: 41.861026
I Training of Epoch 38 - loss: 10.179187
I Validation of Epoch 38 - loss: 47.811639
I Training of Epoch 39 - loss: 9.144761
I Validation of Epoch 39 - loss: 51.391582
I Early stop triggered as (for last 4 steps) validation loss: 51.391582 with standard deviation: 3.627741 and mean: 42.931224
I FINISHED Optimization - training time: 5:27:44

It seems like the model is not learning at all. Any thoughts on this?

It clearly is learning, look at the training loss. It’s overfitting your training set. The problem is that you’re training for too long, on a narrow dataset. Try one or two epochs rather than 20. Stop training as soon as validation loss starts to go up.

I tried training it for one epoch. With the generated pbmm file and my language model, I ran inference and found that the accuracy is reduced for my model as compared to the 0.4.1 release model for an American accent English speaker. But, it remains the same for an Indian accent speaker. I was wondering about the language model. I know that it is used for running inference in the test epoch. But, is it used in validation of an epoch? I am asking because I was thinking about replacing the language model in the DeepSpeech ‘data’ directory to see if it would make a difference.

The language model is only used for inference and test epochs.

I did the following. Do you have any other suggestion? I reduced the learning rate, made the batch sizes smaller since my dataset is small. I don’t know what else to try.
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir ~/deepspeech-0.4.1-checkpoint --epoch -1 --train_files /home/rsandhu/iitm_data/assamese-male-train.csv,/home/rsandhu/iitm_data/assamese-female-train.csv,/home/rsandhu/iitm_data/bengali-male-train.csv,/home/rsandhu/iitm_data/gujarati-male-train.csv,/home/rsandhu/iitm_data/gujarati-female-train.csv,/home/rsandhu/iitm_data/hindi-male-train.csv,/home/rsandhu/iitm_data/hindi-female-train.csv,/home/rsandhu/iitm_data/kannada-male-train.csv,/home/rsandhu/iitm_data/kannada-female-train.csv,/home/rsandhu/iitm_data/malayalam-male-train.csv,/home/rsandhu/iitm_data/malayalam-female-train.csv,/home/rsandhu/iitm_data/manipuri-male-train.csv,/home/rsandhu/iitm_data/manipuri-female-train.csv,/home/rsandhu/iitm_data/rajasthani-male-train.csv,/home/rsandhu/iitm_data/rajasthani-female-train.csv,/home/rsandhu/iitm_data/tamil-male-train.csv,/home/rsandhu/iitm_data/tamil-female-train.csv,/home/rsandhu/jan_first_week_training_visteon_internal_data/jan-first-week-train.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_october_training/sva-train.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_november_training/sva-train.csv --dev_files /home/rsandhu/iitm_data/assamese-male-dev.csv,/home/rsandhu/iitm_data/assamese-female-dev.csv,/home/rsandhu/iitm_data/bengali-male-dev.csv,/home/rsandhu/iitm_data/gujarati-male-dev.csv,/home/rsandhu/iitm_data/gujarati-female-dev.csv,/home/rsandhu/iitm_data/hindi-male-dev.csv,/home/rsandhu/iitm_data/hindi-female-dev.csv,/home/rsandhu/iitm_data/kannada-male-dev.csv,/home/rsandhu/iitm_data/kannada-female-dev.csv,/home/rsandhu/iitm_data/malayalam-male-dev.csv,/home/rsandhu/iitm_data/malayalam-female-dev.csv,/home/rsandhu/iitm_data/manipuri-male-dev.csv,/home/rsandhu/iitm_data/manipuri-female-dev.csv,/home/rsandhu/iitm_data/rajasthani-male-dev.csv,/home/rsandhu/iitm_data/rajasthani-female-dev.csv,/home/rsandhu/iitm_data/tamil-male-dev.csv,/home/rsandhu/iitm_data/tamil-female-dev.csv,/home/rsandhu/jan_first_week_training_visteon_internal_data/jan-first-week-dev.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_october_training/sva-dev.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_november_training/sva-dev.csv,/home/rsandhu/common_voice_training_data/cv-valid-dev.csv,/mnt/librivox_data/librivox-dev-clean.csv,/mnt/librivox_data/librivox-dev-other.csv --test_files /home/rsandhu/iitm_data/assamese-male-test.csv,/home/rsandhu/iitm_data/assamese-female-test.csv,/home/rsandhu/iitm_data/bengali-male-test.csv,/home/rsandhu/iitm_data/gujarati-male-test.csv,/home/rsandhu/iitm_data/gujarati-female-test.csv,/home/rsandhu/iitm_data/hindi-male-test.csv,/home/rsandhu/iitm_data/hindi-female-test.csv,/home/rsandhu/iitm_data/kannada-male-test.csv,/home/rsandhu/iitm_data/kannada-female-test.csv,/home/rsandhu/iitm_data/malayalam-male-test.csv,/home/rsandhu/iitm_data/malayalam-female-test.csv,/home/rsandhu/iitm_data/manipuri-male-test.csv,/home/rsandhu/iitm_data/manipuri-female-test.csv,/home/rsandhu/iitm_data/rajasthani-male-test.csv,/home/rsandhu/iitm_data/rajasthani-female-test.csv,/home/rsandhu/iitm_data/tamil-female-test.csv,/home/rsandhu/iitm_data/tamil-male-test.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_october_training/sva-test.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_november_training/sva-test.csv --learning_rate 0.00005 --train_batch_size 5 --dev_batch_size 5 --test_batch_size 48 --display_step 0 --validation_step 1 --dropout_rate 0.2 --checkpoint_step 1 --lm_alpha 0.75 --lm_beta 1.85 --export_dir ~/new_model