What are all the English language characters that I can use in my custom dataset to further train the STT model?

Hi All,

I’m a bit confused about which all English characters can I use in my custom dataset to train STT model. What I precisely would like to understand is whether I can use special characters like “?”, “,”, “-” etc in the text of my dataset? Right now, my understanding is that we can only use 26 English alphabets + apostrophe + blank space in the text part of our dataset. Am I right/wrong here?

I am planning to employ transfer learning to re-train the mozilla deepspeech checkpoint on my data.

Thanks a lot!

There is an alphabet file that you can adapt to your needs, but if you do train English I would start by stripping the extra chars from your text. DeepSpeech tries to guess letters, and special chars are hard to pronounce. Maybe with the exception of a question mark. If you get more and more confident with DeepSpeech you can try to extend the alphabet.

Hi, thanks for your response. I checked the alphabet file, it contains 26 English alphabets + apostrophe. So does this mean that the acoustic model gives out 27 outputs(26 alphabets + apostrophe)?

Yes, and a white space. Run the deepspeech command line without a scorer and you can see the raw output of the acoustic model.

Okay thanks, that was helpful. I’ll try that as well. I just have one more query for now, let’s say if I add “?” in the alphabet.txt and I also add “?” in the sentences used in training, will the model learn to predict “?” now?

Also, will this effect my accuracy? Since the deepspeech pre-trained by mozilla was trained to give out 28 outputs only and with the addition of “?”, the model now has 29 outputs to predict.

Yes, it will try to predict that, but as it sounds differently each time (because of different sentence endings) it might be hard to learn.

Maybe, search google for “asr punctuation”. Looks like it works for some people.

Thanks a lot. Your inputs are really helpful! Have a great day!