How hard is it to train a new model?

Hi guys, just a very newbie question.

I am a mainstream developer (full stack) and have some ideas to develop an application. I have no previous knowledge of voice-recognition algorithms and only have basics of AI/ML.

My native language is Portuguese. I wanted to build an application for my language. The thing is that I’m from Portugal, and the Portuguese available on Mozilla Deepspeech is Brazillian. European and Brazillian Portuguese are extremely different phonetically.

Because of this, I would have to start building a language model from ground zero.

My question is, realistically, how hard would it be for me to develop such a model using Deepspeech? I’ve searched everywhere for a question like this but couldn’t find anything about it.

Thank you in advance guys.

In terms of actual difficulty, not much. But there are scale barriers - you would need at least a few thousand hours of transcribed Portuguese speech and multiple graphics cards for training in a reasonable amount of time (or rent a multi-GPU server by the hour on AWS).

2 Likes

I see, thank you for the info!

@reuben might be able to help there :slight_smile:

1 Like