Some Beginner Questions

Hello Guys,

im currently trying to Evaluate some of the “most accurate” STT-Models and i have some Questions about DeepSpeech / Common Voice.

  1. I guess DeepSpeech is trained on the full Common Voice (English) Dataset. Would there be any reason for me to Download the full Dataset, and try to Augment the Data by adding Noise and then Fine-Tune the Model with that Data?
    Or should i only try to Fine-Tune the Model if i have new data?

  2. which datasets is DeepSpeech trained on?

  3. If would train a new Model in another Language, would i still do Transfer-Learning or would i have to train the Model from Scratch?

  4. Are there any resources you guys would recommend to a beginner, who wants to start with DeepSpeech?

  5. Is there maybe a List with Pretrained-Models including some Scores so that i know on which Model i should build on?

Thanks Guys

This is documented in the release description: https://github.com/mozilla/DeepSpeech/releases/tag/v0.9.3

same link

That depends on what you want to achieve and what data you have

Documentation? Playbook? https://deepspeech.readthedocs.io/ https://mozilla.github.io/deepspeech-playbook/

It’s already listed in a thread on discourse.