Hey everyone,
I am currently preparing my Master Thesis about the importance of privacy when using Chatbots. Basically, I am going to built two Chatbots, one based on Google’s Dialogflow and one using an offline tech stack.
At the beginning of the recherche I was sure that I was going to use Deep Speech together with the pretrained model for Speech to text, but I am scoring an error rate of about 50%.
For testing, I recorded a WAV file in Audacity (Mono Channel, 16 KHz sampling) send it through Deep Speech using the tutorial on the GitHub.
I then used Picovoice Cheetah to transcribe the same WAV and got an Error Rate of about 20%. Google’s Cloud Speech to text got every single word correct.
My question is, is it possible I did something wrong, or does Deep Speech have a problem with English with a German accent? I am really wondering since Picovoice is stating on their webpage that their model has a higher error rate than Deep Speech
Greetings
Max