I’m preparing a Spanish model with Common Voice 40h, Caito dataset and Open SLR dataset. These are 200 hours, which are not so many. The possitive fact is that I’ll be receiving 1h of audio+transcription every day. Would it be better to wait for say 1000h or train the model with the audio I currently have while fine tuning it every day with that hour of audios? Thanks!
That’s a good question. I guess maybe my question to you is:
Does the model created from 200 hours provide a “MVP” for the use case you’re working on?
I don’t think so. But I won’t use the model until it works well. The idea of continuos training is saving time. Using just 1h of audio each day will save resources. I guess…
EDIT: There is also the option of Data Augmentation. With 200h I could get +500h with this technique. And the same with the daily hour of audios.
do you have the link for this method ?
You can check this link. It is an implementation from a user but also Mozilla developers explain their techniques there. Hope you find it useful.
Very useful, it seems that they’ll implemented it in Mozilla DeepSpeech.
Did you use it already ?
Not yet. At this moment I’m training the model with Common Voice data on Google Colab, while waiting for a GPU.