I did the training of DeepSpeech Model on 300 hours of data (tedlium + voxforge) as proof of concept to establish that we can do a large scale training and get something useful (say 85% accurate)
Hardware : GTX 1080Ti
- I reduced n_hidden = 1024 as I had 300 hours of data only.
- Dropout : 0.30
- Train Batch Size : 64
Language model was the pre-trained released by DeepSpeech.
Rest all the parameters were same or default
Here are how model was learning.
- It learnt for initial 10 epochs as train and validation error was doing down.
- It did not learnt anything in next 10 epochs. It was not even over-fitting.
It seems like it’s a case of underfitting where both train and validation loss very high and not decreasing.
Overall test accuracy was 20% which is very poor (WER 80%).
It took straight 9-10 hours to train (300 hours) this model for 20 iterations on a 1080Ti GPU system.
Can anyone give me some insight of what I should have done to get some good accuracy ? I saw people getting WER ~39% alone on Ted corpus.
Just wanted to know whether there are experimented parameters which has yielded best accuracy on ted+voxforge corpus ?