Validation loss plateus after some epochs

costas · April 8, 2019, 12:28pm

Dear all,

I am training a dataset of 70 hours. The data are shuffled before input to the network and splitted to 70/30/10 (train/val/test). While the training loss decreases the validation loss plateus after some epochs and remains the same at validation loss of 67. Which outputs a high WER (27 %).
I tried many parameters to experiment with model complexity such as hidden nodes (128, 256, 512, 1024, 2056) from less complexity to high complexity and drop out rates (0.05, 0.2, 0.35, 0.4) to avoid overfitting but this issue remains and i get the same validation loss. Do you have any suggestion on how to decrease further the validation loss and make the model better.

thanks

reuben · April 8, 2019, 12:45pm

Few things that come to mind:

Check statistics on the train/validation/test splits to make sure there’s no big discrepancy in the data distributions
Increase the size of the dataset
Try some normalization technique, like layer norm
Try an alternative learning rate schedule
Try transfer-learning from our pre-trained English model

tuttlebr · April 8, 2019, 7:01pm

One thing I did early on was use the pre-trained model on my training data. Those samples with a WER >.15 were held out for review and considered unvalidated samples. I trained on the validated data, then as I corrected issues in transcription/audio pairs, I gradually added it back into the training pipeline. I had a lot of dirty data however, may not be the same thing you’re running into.

The most dramatic impact to my WER was:

Make sure I have valid audio/transcription pairs
scrubbed transcriptions for spelling errors and inconsistencies in labeling as these will hurt the performance of a custom language model (I don’t think this is used in training however…)
Make sure audio is mono, 16-bit PCM @ 16kHz
Gradually increased the learning rate when restarting from an existing checkpoint
Depending on your use case, you could restrict the alphabet to [a-z '], if you’re doing something mode complex like [a-zA-Z '.?,]

caucheteux · August 21, 2019, 2:37pm

Hi all,

I’m currently optimizing my hyperparameters and I would like to start by the complexity of my model (n_hidden). My question is : How can I do that ? Because the number of epochs have a big impact on my WER…
Do I have to fix the number of epochs ? If yes, how can I choose the best nb of epochs or it doesn’t matter, I take a random number and change only n_hidden ?

Thanks !