Hello, I have some doubts about how to make the database ideal for a given transcription context. Anyone who has had experience with this can help me with this question.
Supposing I put DeepSpeech to transcribe phone calls from a supermarket … I know that I must make an audio base that has hundreds of different speakers but the phrases that the speakers will say must always be the same or I must assemble hundreds of phrases and distribute among them ensuring that they do not repeat? It may seem like a very weak question but I would like to know that.
In addition, which large systems use DeepSpeech as a transcription engine? How did they make the database that was used?
I feel that my problem is knowing how to make the ideal foundation, because using DeepSpeech and training it I already have experience, but I don’t have good results…
I am looking to create a database in Brazilian Portuguese