Training results with 0.4.1 far worse than 0.3.0

rajpuneet.sandhu · January 18, 2019, 3:46pm

@kdavis the audio file I used for testing has an American female speaker speaking general English language. I tested it manually, this is not the result of test epoch that I am talking about.

rajpuneet.sandhu · January 18, 2019, 8:52pm

@reuben @lissyx any thoughts on this?

reuben · January 19, 2019, 1:22am

I haven’t seen enough evidence to convince me that 0.4.1 is performing far worse than 0.3.0 as you suggest, so I don’t know how I can help. Like @kdavis already said, start with the basics, don’t fine tune the 0.4.1 model.

rajpuneet.sandhu · January 22, 2019, 3:55pm

I think there’s some confusion here. Let’s forget that Indian places are not getting recognized. So, changing the language model is out of the question here. I test an audio recording with American accent that has no region specific names just general English language and the result is ‘x’. I did incremental training. I test using the same audio file and the result is ‘y’. What I observe is that the result ‘y’ is worse than ‘x’. This shouldn’t have happened. Was I able to clear the situation here @reuben @kdavis @lissyx?

kdavis · January 22, 2019, 4:09pm

How is the data you train on validated? If you train on bad data, you will get bad results.

rajpuneet.sandhu · January 22, 2019, 4:14pm

@kdavis one of the datasets I can’t vouch for. But, I tried training on a dataset that me and 8 other people have generated by recording our own voice samples which is a 100% accurate. I observe the same thing over there.

rajpuneet.sandhu · January 22, 2019, 4:19pm

Something else I wanted to add here was, I picked up a few audio files that were used to train the model and tested using those file. For those the accuracy is 100%.

kdavis · January 22, 2019, 4:29pm

Sounds like you might be overfitting?

rajpuneet.sandhu · January 22, 2019, 4:32pm

Exactly. But, early stopping is triggered and then I retrain only for the number of epochs till which the loss is reduced and after which it starts increasing.

rajpuneet.sandhu · January 22, 2019, 4:36pm

Any idea how I could avoid this? If this is not the correct approach

rajpuneet.sandhu · January 22, 2019, 6:03pm

@josh_meyer could you give any suggestions on how I could prevent over fitting since you are the expert on transfer learning? Fine tuning is resulting in overfitting and making the model perform worse than before.

reuben · January 22, 2019, 6:15pm

Train for fewer epochs, and try extending the validation set with your data rather than replacing it.

rajpuneet.sandhu · January 22, 2019, 6:24pm

@reuben here is the process I follow:

I start training for 10 epochs with DeepSpeech 0.4.1 checkpoint
I observe that the validation loss is reduced till 6th epoch, starts increasing after that and early stopping is triggered in 8th epoch.
I pickup the DeepSpeech 0.4.1 checkpoint again and this time train only till 6th epoch.

Are you saying that I should stop training before the 6th epoch even though the validation loss is decreasing?

My validation set is still the same. The last test result that I have posted is the inference result by running DeepSpeech with the release model and the model that I have exported

reuben · January 22, 2019, 9:08pm

Right, and I’m suggesting this could be (part of) the problem. Instead of just using your data for the validation set, mix it with the data we used when training v0.4.1 like Librispeech.

rajpuneet.sandhu · January 22, 2019, 9:37pm

oh! I see. Thanks. I’m gonna try doing that. But, I had just one more question. Right now I have split my train, dev and test into 70%-20%-10% resp just like @elpimous_robot’s tutorial. If I include your validation data then it would no longer be in that proportion. Would that be a problem?

rajpuneet.sandhu · January 23, 2019, 10:06pm

@reuben I included the librispeech and common voice validation data sets. I used the following command:

python3 DeepSpeech.py --n_hidden 2048 --export_dir ~/new_model --checkpoint_dir ~/deepspeech-0.4.1-checkpoint --epoch -20 --train_files /home/rsandhu/iitm_data/assamese-male-train.csv,/home/rsandhu/iitm_data/assamese-female-train.csv,/home/rsandhu/iitm_data/bengali-male-train.csv,/home/rsandhu/iitm_data/gujarati-male-train.csv,/home/rsandhu/iitm_data/gujarati-female-train.csv,/home/rsandhu/iitm_data/hindi-male-train.csv,/home/rsandhu/iitm_data/hindi-female-train.csv,/home/rsandhu/iitm_data/kannada-male-train.csv,/home/rsandhu/iitm_data/kannada-female-train.csv,/home/rsandhu/iitm_data/malayalam-male-train.csv,/home/rsandhu/iitm_data/malayalam-female-train.csv,/home/rsandhu/iitm_data/manipuri-male-train.csv,/home/rsandhu/iitm_data/manipuri-female-train.csv,/home/rsandhu/iitm_data/rajasthani-male-train.csv,/home/rsandhu/iitm_data/rajasthani-female-train.csv,/home/rsandhu/iitm_data/tamil-male-train.csv,/home/rsandhu/iitm_data/tamil-female-train.csv --dev_files /home/rsandhu/iitm_data/assamese-male-dev.csv,/home/rsandhu/iitm_data/assamese-female-dev.csv,/home/rsandhu/iitm_data/bengali-male-dev.csv,/home/rsandhu/iitm_data/gujarati-male-dev.csv,/home/rsandhu/iitm_data/gujarati-female-dev.csv,/home/rsandhu/iitm_data/hindi-male-dev.csv,/home/rsandhu/iitm_data/hindi-female-dev.csv,/home/rsandhu/iitm_data/kannada-male-dev.csv,/home/rsandhu/iitm_data/kannada-female-dev.csv,/home/rsandhu/iitm_data/malayalam-male-dev.csv,/home/rsandhu/iitm_data/malayalam-female-dev.csv,/home/rsandhu/iitm_data/manipuri-male-dev.csv,/home/rsandhu/iitm_data/manipuri-female-dev.csv,/home/rsandhu/iitm_data/rajasthani-male-dev.csv,/home/rsandhu/iitm_data/rajasthani-female-dev.csv,/home/rsandhu/iitm_data/tamil-male-dev.csv,/home/rsandhu/iitm_data/tamil-female-dev.csv,/home/rsandhu/common_voice_training_data/cv-valid-dev.csv,/mnt/librivox_data/librivox-dev-clean.csv,/mnt/librivox_data/librivox-dev-other.csv --test_files /home/rsandhu/iitm_data/assamese-male-test.csv,/home/rsandhu/iitm_data/assamese-female-test.csv,/home/rsandhu/iitm_data/bengali-male-test.csv,/home/rsandhu/iitm_data/gujarati-male-test.csv,/home/rsandhu/iitm_data/gujarati-female-test.csv,/home/rsandhu/iitm_data/hindi-male-test.csv,/home/rsandhu/iitm_data/hindi-female-test.csv,/home/rsandhu/iitm_data/kannada-male-test.csv,/home/rsandhu/iitm_data/kannada-female-test.csv,/home/rsandhu/iitm_data/malayalam-male-test.csv,/home/rsandhu/iitm_data/malayalam-female-test.csv,/home/rsandhu/iitm_data/manipuri-male-test.csv,/home/rsandhu/iitm_data/manipuri-female-test.csv,/home/rsandhu/iitm_data/rajasthani-male-test.csv,/home/rsandhu/iitm_data/rajasthani-female-test.csv,/home/rsandhu/iitm_data/tamil-female-test.csv,/home/rsandhu/iitm_data/tamil-male-test.csv --learning_rate 0.000095 --train_batch_size 12 --dev_batch_size 24 --test_batch_size 24 --display_step 0 --validation_step 1 --dropout_rate 0.2 --checkpoint_step 1 --lm_alpha 0.75 --lm_beta 1.85

I Training of Epoch 36 - loss: 17.247186
I Validation of Epoch 36 - loss: 39.121008
I Training of Epoch 37 - loss: 12.315964
I Validation of Epoch 37 - loss: 41.861026
I Training of Epoch 38 - loss: 10.179187
I Validation of Epoch 38 - loss: 47.811639
I Training of Epoch 39 - loss: 9.144761
I Validation of Epoch 39 - loss: 51.391582
I Early stop triggered as (for last 4 steps) validation loss: 51.391582 with standard deviation: 3.627741 and mean: 42.931224
I FINISHED Optimization - training time: 5:27:44

It seems like the model is not learning at all. Any thoughts on this?

reuben · January 23, 2019, 11:05pm

It clearly is learning, look at the training loss. It’s overfitting your training set. The problem is that you’re training for too long, on a narrow dataset. Try one or two epochs rather than 20. Stop training as soon as validation loss starts to go up.

rajpuneet.sandhu · January 24, 2019, 4:37pm

I tried training it for one epoch. With the generated pbmm file and my language model, I ran inference and found that the accuracy is reduced for my model as compared to the 0.4.1 release model for an American accent English speaker. But, it remains the same for an Indian accent speaker. I was wondering about the language model. I know that it is used for running inference in the test epoch. But, is it used in validation of an epoch? I am asking because I was thinking about replacing the language model in the DeepSpeech ‘data’ directory to see if it would make a difference.

reuben · January 24, 2019, 4:44pm

The language model is only used for inference and test epochs.

rajpuneet.sandhu · January 24, 2019, 5:54pm

I did the following. Do you have any other suggestion? I reduced the learning rate, made the batch sizes smaller since my dataset is small. I don’t know what else to try.
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir ~/deepspeech-0.4.1-checkpoint --epoch -1 --train_files /home/rsandhu/iitm_data/assamese-male-train.csv,/home/rsandhu/iitm_data/assamese-female-train.csv,/home/rsandhu/iitm_data/bengali-male-train.csv,/home/rsandhu/iitm_data/gujarati-male-train.csv,/home/rsandhu/iitm_data/gujarati-female-train.csv,/home/rsandhu/iitm_data/hindi-male-train.csv,/home/rsandhu/iitm_data/hindi-female-train.csv,/home/rsandhu/iitm_data/kannada-male-train.csv,/home/rsandhu/iitm_data/kannada-female-train.csv,/home/rsandhu/iitm_data/malayalam-male-train.csv,/home/rsandhu/iitm_data/malayalam-female-train.csv,/home/rsandhu/iitm_data/manipuri-male-train.csv,/home/rsandhu/iitm_data/manipuri-female-train.csv,/home/rsandhu/iitm_data/rajasthani-male-train.csv,/home/rsandhu/iitm_data/rajasthani-female-train.csv,/home/rsandhu/iitm_data/tamil-male-train.csv,/home/rsandhu/iitm_data/tamil-female-train.csv,/home/rsandhu/jan_first_week_training_visteon_internal_data/jan-first-week-train.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_october_training/sva-train.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_november_training/sva-train.csv --dev_files /home/rsandhu/iitm_data/assamese-male-dev.csv,/home/rsandhu/iitm_data/assamese-female-dev.csv,/home/rsandhu/iitm_data/bengali-male-dev.csv,/home/rsandhu/iitm_data/gujarati-male-dev.csv,/home/rsandhu/iitm_data/gujarati-female-dev.csv,/home/rsandhu/iitm_data/hindi-male-dev.csv,/home/rsandhu/iitm_data/hindi-female-dev.csv,/home/rsandhu/iitm_data/kannada-male-dev.csv,/home/rsandhu/iitm_data/kannada-female-dev.csv,/home/rsandhu/iitm_data/malayalam-male-dev.csv,/home/rsandhu/iitm_data/malayalam-female-dev.csv,/home/rsandhu/iitm_data/manipuri-male-dev.csv,/home/rsandhu/iitm_data/manipuri-female-dev.csv,/home/rsandhu/iitm_data/rajasthani-male-dev.csv,/home/rsandhu/iitm_data/rajasthani-female-dev.csv,/home/rsandhu/iitm_data/tamil-male-dev.csv,/home/rsandhu/iitm_data/tamil-female-dev.csv,/home/rsandhu/jan_first_week_training_visteon_internal_data/jan-first-week-dev.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_october_training/sva-dev.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_november_training/sva-dev.csv,/home/rsandhu/common_voice_training_data/cv-valid-dev.csv,/mnt/librivox_data/librivox-dev-clean.csv,/mnt/librivox_data/librivox-dev-other.csv --test_files /home/rsandhu/iitm_data/assamese-male-test.csv,/home/rsandhu/iitm_data/assamese-female-test.csv,/home/rsandhu/iitm_data/bengali-male-test.csv,/home/rsandhu/iitm_data/gujarati-male-test.csv,/home/rsandhu/iitm_data/gujarati-female-test.csv,/home/rsandhu/iitm_data/hindi-male-test.csv,/home/rsandhu/iitm_data/hindi-female-test.csv,/home/rsandhu/iitm_data/kannada-male-test.csv,/home/rsandhu/iitm_data/kannada-female-test.csv,/home/rsandhu/iitm_data/malayalam-male-test.csv,/home/rsandhu/iitm_data/malayalam-female-test.csv,/home/rsandhu/iitm_data/manipuri-male-test.csv,/home/rsandhu/iitm_data/manipuri-female-test.csv,/home/rsandhu/iitm_data/rajasthani-male-test.csv,/home/rsandhu/iitm_data/rajasthani-female-test.csv,/home/rsandhu/iitm_data/tamil-female-test.csv,/home/rsandhu/iitm_data/tamil-male-test.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_october_training/sva-test.csv,/home/rsandhu/sva_voicedata_set_partitioned_used_november_training/sva-test.csv --learning_rate 0.00005 --train_batch_size 5 --dev_batch_size 5 --test_batch_size 48 --display_step 0 --validation_step 1 --dropout_rate 0.2 --checkpoint_step 1 --lm_alpha 0.75 --lm_beta 1.85 --export_dir ~/new_model