I’m trying to learn a model for the Persian dataset. I learnt the model this summer in 4 hidden layers and 2024 neurons each(the default values). After 25 epochs, when I was trying to test my model I got an empty string! So it seems that my model doesn’t learn well! So I read the Baidu DeepSpeech paper and realized that my model was too big (I think:) ). Now I need your experience with learning the DeepSpeech model. How many layers should I put for my model and how many neurons? Persian dataset is about 70 hours of voices.
It’d be great if someone tells me about Mozilla’s English model that what is this model configuration.
Thank you for helping.