I’m attempting to use DeepSpeech to create a “voice keyboard” for typing on the keyboard with just your voice. The speech recognition converts your words into keys that will be pressed on your keyboard (using robotjs).
Here’s a screenshot of some of my test code to give you an idea:
The numbers and symbols all work fairly well. But some letters are actually impossible to say. The Common Voice DeepSpeech model does not have words that are even close to some letters, so i’ve been trying to come up with a set of substitutions like if I say “tea” it converts to "t’ and so on.
But this doesn’t work for many of the letters - D, E, F, G, H, Q, M, S, X etc. These letters don’t have english words that are very similar to how they are pronounced, so at best in some cases you’ll get like “the” when you say “D”, but never “dee” or “de”. For some letters, it can sometimes get it right if I say “letter S” “letter O”, but for a lot of them it seems damned near impossible.
I’m contemplating trying to build a DeepSpeech model to handle just letters. But i"d need tons of audio of lots of people saying these letters in different order / speed / accent etc.
I was wondering if it would be feasible to get Letters to be included in the main DeepSpeech Common Voice model. Maybe make some “fake” words like
X -> ecks
F -> eff
S -> ess
Q -> queue / cue - (these words seem to never get recognized when I say them and I speak english fluently)
As long as every letter has a real word or “fake” word that can get recognized reasonably well then this will work.