Validation loss plateus after some epochs

Dear all,

I am training a dataset of 70 hours. The data are shuffled before input to the network and splitted to 70/30/10 (train/val/test). While the training loss decreases the validation loss plateus after some epochs and remains the same at validation loss of 67. Which outputs a high WER (27 %).
I tried many parameters to experiment with model complexity such as hidden nodes (128, 256, 512, 1024, 2056) from less complexity to high complexity and drop out rates (0.05, 0.2, 0.35, 0.4) to avoid overfitting but this issue remains and i get the same validation loss. Do you have any suggestion on how to decrease further the validation loss and make the model better.


Few things that come to mind:

  • Check statistics on the train/validation/test splits to make sure there’s no big discrepancy in the data distributions
  • Increase the size of the dataset
  • Try some normalization technique, like layer norm
  • Try an alternative learning rate schedule
  • Try transfer-learning from our pre-trained English model

One thing I did early on was use the pre-trained model on my training data. Those samples with a WER >.15 were held out for review and considered unvalidated samples. I trained on the validated data, then as I corrected issues in transcription/audio pairs, I gradually added it back into the training pipeline. I had a lot of dirty data however, may not be the same thing you’re running into.

The most dramatic impact to my WER was:

  • Make sure I have valid audio/transcription pairs
  • scrubbed transcriptions for spelling errors and inconsistencies in labeling as these will hurt the performance of a custom language model (I don’t think this is used in training however…)
  • Make sure audio is mono, 16-bit PCM @ 16kHz
  • Gradually increased the learning rate when restarting from an existing checkpoint
  • Depending on your use case, you could restrict the alphabet to [a-z '], if you’re doing something mode complex like [a-zA-Z '.?,]

Hi all,

I’m currently optimizing my hyperparameters and I would like to start by the complexity of my model (n_hidden). My question is : How can I do that ? Because the number of epochs have a big impact on my WER…
Do I have to fix the number of epochs ? If yes, how can I choose the best nb of epochs or it doesn’t matter, I take a random number and change only n_hidden ?

Thanks !