I am training a dataset of 70 hours. The data are shuffled before input to the network and splitted to 70/30/10 (train/val/test). While the training loss decreases the validation loss plateus after some epochs and remains the same at validation loss of 67. Which outputs a high WER (27 %).
I tried many parameters to experiment with model complexity such as hidden nodes (128, 256, 512, 1024, 2056) from less complexity to high complexity and drop out rates (0.05, 0.2, 0.35, 0.4) to avoid overfitting but this issue remains and i get the same validation loss. Do you have any suggestion on how to decrease further the validation loss and make the model better.
One thing I did early on was use the pre-trained model on my training data. Those samples with a WER >.15 were held out for review and considered unvalidated samples. I trained on the validated data, then as I corrected issues in transcription/audio pairs, I gradually added it back into the training pipeline. I had a lot of dirty data however, may not be the same thing you’re running into.
The most dramatic impact to my WER was:
Make sure I have valid audio/transcription pairs
scrubbed transcriptions for spelling errors and inconsistencies in labeling as these will hurt the performance of a custom language model (I don’t think this is used in training however…)
Make sure audio is mono, 16-bit PCM @ 16kHz
Gradually increased the learning rate when restarting from an existing checkpoint
Depending on your use case, you could restrict the alphabet to [a-z '], if you’re doing something mode complex like [a-zA-Z '.?,]
I’m currently optimizing my hyperparameters and I would like to start by the complexity of my model (n_hidden). My question is : How can I do that ? Because the number of epochs have a big impact on my WER…
Do I have to fix the number of epochs ? If yes, how can I choose the best nb of epochs or it doesn’t matter, I take a random number and change only n_hidden ?