Help for find good model configuration for Persian Dataset

Hi, everybody

I’m trying to learn a model for the Persian dataset. I learnt the model this summer in 4 hidden layers and 2024 neurons each(the default values). After 25 epochs, when I was trying to test my model I got an empty string! So it seems that my model doesn’t learn well! So I read the Baidu DeepSpeech paper and realized that my model was too big (I think:) ). Now I need your experience with learning the DeepSpeech model. How many layers should I put for my model and how many neurons? Persian dataset is about 70 hours of voices.
It’d be great if someone tells me about Mozilla’s English model that what is this model configuration.

Thank you for helping.

You should not change the number of layers. Adapting the width of the network might be an option, but I would recommend sticking to 2048 for now.

With 70h, you will not get anything really meaningful, but you should at least get something different than empty string.

Have you read the documentation and the releases notes? This is documented.

One thing that might help you is using transfer learning made by @josh_meyer. It’s not yet merged into master, but you can likely try to work from his PR: