I have been training a deepspeech model for quite a few epochs now and my validation loss seems to have reached a point where it now has plateaued. After reading several other discourse posts the general solution seemed to be that I should reduce the learning rate.
I have done this twice (at the points marked on the tensorboard graph) and this did make a slight difference initially but then the validation loss returned to it’s previously plateaued level.
I also increased the dropout rate in the hopes that this would produce a more generalised model but and improve the validation loss but it really only increased the training loss and didn’t change the validation loss.
My next thought is to increase the size of the dataset (currently a combination of Common Voice, Librispeech and TED-LIUM at around 1700 hours) Are there any other changes that could be performed other than collecting more data?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Hard to tell without a better overview of your current training parameters.
From your plot (which is missing legends, so I don’t know what is on the x axis, I’ll assume epochs), it seems you are not learning really after epoch 30k, the (hard to read) lines seems to have the asme delta until 40k where you obviously overfit.
Then I changed both learning rate and dropout rate:
learning_rate 0.000001
dropout_rate 0.25
The graph’s axis are:
Y - Loss
X - Steps (so with my 4 GPU’s and a batch size of 32 this is 128 files per step and with the data I have it is 1432 steps per epoch)
I realise that there is a lack of learning after about 30k steps and the model starts heading towards overfitting after this point. I am just asking if there are any suggestions to changes to the parameters that can be done to aid learning (ie. reduce loss on the validation set) other than adding more data?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
As much as I recall, the loss value is relative to your dataset. What’s your WER when you reach some proper level of learning and no overfit ?
This is a graph that is plotted by using tensorboard. During the training process DeepSpeech produces tensorboard logs, you can then view some information about the training process and this is one of the graphs
Is this as simple as running tensorboard --logdir /path/to/summaries/dir/ in terminal, when training? Haven’t used tensorboard during training before but would like to.
Yeah that’s exactly how you do it. Doing it during training you might find a few little bugs when trying to refresh to get new data but just restarting tensorboard using the command you listed will properly refresh it. To pick the output directory for the tensorboard logs just type deepspeech -h and there should be an option for picking a tensorboard directory