Mozilla Voice STT in the Wild!

Hello, back in 2018 I was just a .NET dev deploying cloud-based ASR, as a side project to my main job I wanted to perform Spanish offline speech recognition for my personal “chatbot” and that’s how I discover DS on GitHub, long story short, I ended up replacing cloud ASR with DS for almost all of my apps :blush:

Projects I’m working on with this new superpower thanks to STT Team and Mozilla:

A .NET app to perform dictation of names, ids, long numbers, dates, details, notes, etc from images on PDF for form filling(offline OCR fails ):
This app is working also with web site similar to common voice and private storage that collects all the audio and text from the form inputs giving the ability to periodically fine-tune on corrections thus outperforming any other cloud ASR. Also, optional transcriptions were key to succeed, usually there are names with different chars at the end but they sound the same. @juan_pablo_Garzon_duenas is also working with me on this and he actually requested the app, thanks! (We meet on DS GitHub) :laughing:

And couple other projects I can’t share about, but they are all similar.

An app used by a microphone firm to perform general dictation, we are achieving really good results (can’t share too much detail on this one) but I can share a video of it vs YT transcriptions on the chat. We are also using Mozilla RNNoise as VAD, thanks again Mozilla!

All of my solutions are based on the open-source WPF example :blush:, fun thing: they found me throw Reuben’s post on Mozilla hacks, thanks again Mozilla.

Key features of DS vs others on this type of projects:

  1. Privacy
  2. Accuracy
  3. Continuous fine-tuning
  4. Puts value on the time and data collected (Can’t do this with cloud ASR, it is illegal to store transcriptions for most of them) this is key, they don’t want to only make Google better by giving away all the data and not being able to store valuable results out of their own data.
  5. Not limited to a single programming language giving the ability to work on almost any existing environment. Usually, open-source ASR only uses Python or C++ which makes it hard to keep a team if they don’t know both, with DS is easy for the team to adapt using their loved lang! (See Nvidia nemo or Kaldi)
  6. The data for Spanish is really limited, there is no dataset like libri for Spanish, making any small dataset built very valuable.

Now the sad part:
They love Mozilla STT and choose it because privacy, they don’t want to share any data making the best of the engine the worst for Mozilla data growing :frowning:

Also looking forward to context-aware hot words, I’m with an eye on @josh_meyer code, this is requested frequently to being able to dictate corporation names took from contracts on the fly.

Thanks to everybody, I love to see how my .NET example is used to build amazing things!

Thanks, thanks thanks thanks!!! And finally: thanks :stuck_out_tongue:

Extra thanks: I know this if for STT but I definitely want to thanks @erogol, I currently not using TTS but there’s a lot of requests of TTS on the fly for information while inside elevators (mostly for hotels on this COVID era), also for kids learning English adapting the listening on their weakness, hopefully, I will grow using Mozilla TTS.

Thanks,

3 Likes