I’m a bit confused about which all English characters can I use in my custom dataset to train STT model. What I precisely would like to understand is whether I can use special characters like “?”, “,”, “-” etc in the text of my dataset? Right now, my understanding is that we can only use 26 English alphabets + apostrophe + blank space in the text part of our dataset. Am I right/wrong here?
I am planning to employ transfer learning to re-train the mozilla deepspeech checkpoint on my data.
There is an alphabet file that you can adapt to your needs, but if you do train English I would start by stripping the extra chars from your text. DeepSpeech tries to guess letters, and special chars are hard to pronounce. Maybe with the exception of a question mark. If you get more and more confident with DeepSpeech you can try to extend the alphabet.
Hi, thanks for your response. I checked the alphabet file, it contains 26 English alphabets + apostrophe. So does this mean that the acoustic model gives out 27 outputs(26 alphabets + apostrophe)?
Okay thanks, that was helpful. I’ll try that as well. I just have one more query for now, let’s say if I add “?” in the alphabet.txt and I also add “?” in the sentences used in training, will the model learn to predict “?” now?
Also, will this effect my accuracy? Since the deepspeech pre-trained by mozilla was trained to give out 28 outputs only and with the addition of “?”, the model now has 29 outputs to predict.