Training DeepSpeech in reinforcement learning envoirment

I want to train deepSpeech model for the Arabic language, first, I am not sure if any pre-trained model is available or not and if it’s available for use? That will be great if you can help me.

Second, I don’t have enough data for model training I believe it might require thousands of hours of audios. But my application architecture is similar to the reinforcement learning environment, First, we will translate our audio to text using some pre-trained model maybe google speechRecognition API (I haven’t decided yet which model to use). it will generate some Arabic, user can modify/correct the output based on what was input audio. and then, I want to train DeepSpeech model based on that audio and text as the label.

the question is, is it possible? I know it might take time, a lot, for the model to predict correctly. but the goal is to learn eventually.

The Possibility I am asking is, like no batch size, one training example, not validation or test set. one epoch training these configurations possible on DeepSpeech?

PS. based on these questions you might guess I haven’t explored the code yet. but the goal of this post is to get expert’s feedback if it’s possible using DeepSpeech or not. instead of spending time exploring the code and then come to know that it’s not possible.

Thanks,

Deepspeech is an ecosystem around one specific neural net (
basically a LSTM) and works well in that setup. You can, of course, change anything as the code is open, but if you have no experience I would suggest you start using it first.

Search the forums, there was sth about correcting spoken Quran verses. So they should have a working Arabic model.

As for the data, you should be able to get somewhat good results with 200-300 hours of input. Common Voice has some hours too, maybe contribute to that?

https://voice.mozilla.org/ar/datasets

yeah thank you that’s great help. Yeah I have played around LSTM and other models so I believe I can do that. thanks for your help