Hey @lissyx
Thanks for the Answers.
I think it is not practical to read through the codebase to understand this. Thats why i asked you guys. There is no clear architectural overview regarding algorithms and system that power DeepSpeech and the decision regarding different perspectives implementing the systems. From a pure Software Engineering perspective i see that STT algorithm and its dependencies should be separate from the Infrastructure. Basically i am looking for high-level design decisions taken by the team. I tried to find in the repo did not find it. If you dont have it documented i am very much interested helping in that front.
Fair enough. This is somehow related to the above point. I am interested knowing how this abstraction system works. I am happy to help if its not documented.
Currently Firefox does not support Web Speech Api, you need to enable in nightly which uses google STT. So i was asking if this is the project that is poised to do that job or this is a separate project that is for a different purpose. Basically what the end goal is for DeepSpeech? the readme says
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow to make the implementation easier.
Thats it. This is a legitimate question as i think you already know Apache has it incubator project and also main projects. There is also Kubernetes, OWASP and other many major organizations that has a way of saying what is the importance of this project in the ORG. Maybe this can also be added in the readme. Also DeepSpeech is quite new compared to Kaldi or CMUSphinx. From releases i can understand it is in quite rapid development mode but not how important it is to Mozilla.
This is kind of important to me as i wanted to know do you(Team) think Kaldi or wave2letter as technically or in practice better, on par with DeepSpeech. WER, RTF etc. is not important to me right now. The reason i am asking this question is, research is quite fast in Machine Learning. So what happens to DeepSpeech if a better method comes around. Generally in well thought out projects as the systems are abstracted in a good way from the beginning it is just a matter of implementing the new thing and some glue logic. I was thinking about if these(Kaldi etc.) projects were thought about when first implementing DeepSpeech. What is the pitch deck of DeepSpeech, why it is better or comparable, similar etc.
I hope it clarifies the questions. These question will always come out as a comparative analysis between DeepSpeech, Kaldi and Wav2Letter. So ORGs wants to know which way to go. As a matter of fact the blog post says this Announcing the Initial Release of Mozilla’s Open Source Speech Recognition Model and Voice Dataset
There are only a few commercial quality speech recognition services available, dominated by a small number of large companies. This reduces user choice and available features for startups, researchers or even larger companies that want to speech-enable their products and services.
Thanks.
Again i am very interested about DeepSpeech so very happy to help if anything is not documented.