I am training a dataset of 70 hours. The data are shuffled before input to the network and splitted to 70/30/10 (train/val/test). While the training loss decreases the validation loss plateus after some epochs and remains the same at validation loss of 67. Which outputs a high WER (27 %).
I tried many parameters to experiment with model complexity such as hidden nodes (128, 256, 512, 1024, 2056) from less complexity to high complexity and drop out rates (0.05, 0.2, 0.35, 0.4) to avoid overfitting but this issue remains and i get the same validation loss. Do you have any suggestion on how to decrease further the validation loss and make the model better.