Ways to decrease validation loss

tbatkin · January 3, 2020, 10:18am

I have been training a deepspeech model for quite a few epochs now and my validation loss seems to have reached a point where it now has plateaued. After reading several other discourse posts the general solution seemed to be that I should reduce the learning rate.

I have done this twice (at the points marked on the tensorboard graph) and this did make a slight difference initially but then the validation loss returned to it’s previously plateaued level.

I also increased the dropout rate in the hopes that this would produce a more generalised model but and improve the validation loss but it really only increased the training loss and didn’t change the validation loss.

My next thought is to increase the size of the dataset (currently a combination of Common Voice, Librispeech and TED-LIUM at around 1700 hours) Are there any other changes that could be performed other than collecting more data?

lissyx · January 6, 2020, 11:25am

Hard to tell without a better overview of your current training parameters.

From your plot (which is missing legends, so I don’t know what is on the x axis, I’ll assume epochs), it seems you are not learning really after epoch 30k, the (hard to read) lines seems to have the asme delta until 40k where you obviously overfit.

tbatkin · January 6, 2020, 11:38am

Initial Training Parameters:

noearly_stop
train_files librivox + TED-LIUM + Common Voice
dev_files librivox-dev-clean.csv
test_files librivox-test-clean.csv
train_batch_size 32
dev_batch_size 32
test_batch_size 32
n_hidden 2048
learning_rate 0.0001
dropout_rate 0.2
epochs 4
lm_alpha 0.75
lm_beta 1.85
audio_sample_rate 8000
use_allow_growth
use_cudnn_rnn

Then my first change I made was to learning rate:

learning_rate 0.00001

Then I changed both learning rate and dropout rate:

learning_rate 0.000001
dropout_rate 0.25

The graph’s axis are:

Y - Loss
X - Steps (so with my 4 GPU’s and a batch size of 32 this is 128 files per step and with the data I have it is 1432 steps per epoch)

I realise that there is a lack of learning after about 30k steps and the model starts heading towards overfitting after this point. I am just asking if there are any suggestions to changes to the parameters that can be done to aid learning (ie. reduce loss on the validation set) other than adding more data?

lissyx · January 6, 2020, 12:37pm

As much as I recall, the loss value is relative to your dataset. What’s your WER when you reach some proper level of learning and no overfit ?

tbatkin · January 6, 2020, 1:17pm

WER is around 17% which is quite good but then obviously with the overfitting any additional training doesn’t do anything to change this WER

lissyx · January 6, 2020, 1:19pm

I’d guess you are just hitting the limit from your dataset capabilities.

tbatkin · January 6, 2020, 1:24pm

Okay I thought that might be the case. Thanks very much for the responses and guidance

masoud_parpanchi · August 3, 2020, 12:18pm

Hi , how you plotted this graph?

tbatkin · August 3, 2020, 12:41pm

This is a graph that is plotted by using tensorboard. During the training process DeepSpeech produces tensorboard logs, you can then view some information about the training process and this is one of the graphs

Epoetin · August 3, 2020, 6:45pm

Is this as simple as running tensorboard --logdir /path/to/summaries/dir/ in terminal, when training? Haven’t used tensorboard during training before but would like to.

Thanks.

tbatkin · August 4, 2020, 8:01am

Yeah that’s exactly how you do it. Doing it during training you might find a few little bugs when trying to refresh to get new data but just restarting tensorboard using the command you listed will properly refresh it. To pick the output directory for the tensorboard logs just type deepspeech -h and there should be an option for picking a tensorboard directory

Topic		Replies	Views
What the best train loss and validation loss is DeepSpeech	2	375	March 14, 2020
Validation loss plateus after some epochs DeepSpeech	3	1265	August 21, 2019
Error rate is not decreasing DeepSpeech	4	838	March 23, 2019
Training Loss vs Test Loss DeepSpeech	8	2576	August 26, 2019
Training 0.6.0 on TED-LIUM 3 - Validation Loss DeepSpeech	17	781	December 13, 2019

Ways to decrease validation loss

Related topics