I’m a bit confused about which all English characters can I use in my custom dataset to train STT model. What I precisely would like to understand is whether I can use special characters like “?”, “,”, “-” etc in the text of my dataset? Right now, my understanding is that we can only use 26 English alphabets + apostrophe + blank space in the text part of our dataset. Am I right/wrong here?
I am planning to employ transfer learning to re-train the mozilla deepspeech checkpoint on my data.
Thanks a lot!