I want to train deepSpeech model for the Arabic language, first, I am not sure if any pre-trained model is available or not and if it’s available for use? That will be great if you can help me.
Second, I don’t have enough data for model training I believe it might require thousands of hours of audios. But my application architecture is similar to the reinforcement learning environment, First, we will translate our audio to text using some pre-trained model maybe google speechRecognition API (I haven’t decided yet which model to use). it will generate some Arabic, user can modify/correct the output based on what was input audio. and then, I want to train DeepSpeech model based on that audio and text as the label.
the question is, is it possible? I know it might take time, a lot, for the model to predict correctly. but the goal is to learn eventually.
The Possibility I am asking is, like no batch size, one training example, not validation or test set. one epoch training these configurations possible on DeepSpeech?
PS. based on these questions you might guess I haven’t explored the code yet. but the goal of this post is to get expert’s feedback if it’s possible using DeepSpeech or not. instead of spending time exploring the code and then come to know that it’s not possible.
Thanks,