Links to pretrained models

dan.bmh · June 26, 2020, 7:34pm

In this topic we want to collect links to pretrained models and checkpoints in various languages.

dan.bmh · June 26, 2020, 7:35pm

You can find a german model with checkpoints here:
(Moved to DeepSpeech-Polyglot, see below)

and here:

dan.bmh · June 26, 2020, 8:06pm

A welsh model is here:

dan.bmh · July 3, 2020, 10:16am

You can now find my german model, as well as models for french and spanish here:

dan.bmh · July 3, 2020, 10:17am

Just a reminder for @lissyx that you wanted to pin this thread:)

lissyx · July 3, 2020, 10:23am

Thanks for the reminder. For more context, we want to find a better solution to that problem, so hopefully we should have something online sooner rather than later (with proper metadata, licensing, versions informations, etc). In the meantime, let’s hope it helps people find resources

lissyx · July 3, 2020, 10:25am

Nice, it shows my project is not known enough: commonvoice-fr/DeepSpeech/Dockerfile.train at master · common-voice/commonvoice-fr · GitHub
Releases · common-voice/commonvoice-fr · GitHub

dan.bmh · July 3, 2020, 12:24pm

Actually I did know about the project, but didn’t know you already trained and published a model, because I couldn’t read the french readmes …
And for some reason I also didn’t check the releases page

lissyx · July 3, 2020, 12:28pm

There is CONTRIBUTING.md as well and in English :[

MestafaKamal · July 10, 2020, 4:42pm

You can find a Kabyle model here

dan.bmh · July 18, 2020, 9:09am

Now you can also find models for Italian and Polish at DeepSpeech-Polyglot and I did retrain the French and Spanish models with the new CommonVoice release.

dan.bmh · September 16, 2020, 6:06pm

DeepSpeech-Polyglot got some updates in the last weeks:

Improved models for German, French and Spanish
Experimental support for training with wav2letter (but I didn’t achieve good results in my first tests)
You can now extract manual subtitles from YouTube playlists to generate more training data (check some videos before to ensure the text alignments are good)

double · September 17, 2020, 7:52pm

@dan.bmh
Thank you for sharing.
You could provide the python command line to use polyglot with e.g. the Spanish model?

dan.bmh · September 17, 2020, 8:46pm

Not sure I’m understanding this right, you can’t “use” the models with that code. It’s only for training new models. Checkout the examples from Deepspeech’s repository and use the “.pbmm” and “.scorer” files from polyglot, if you want to use Spanish instead of English.

double · September 18, 2020, 7:21am

@dan.bmh
I see. Thank you for your great work.

fhalamos · October 22, 2020, 9:00pm

@dan.bmh I could not find the “.pbmm” and “.scorer” files in the polyglot repo. From the Readme it seems that we have to do the training in order to create the models right? Is there anywhere we can directly download the .pbmm and .scorer files? I particularly need spanish.

sanjaesc · October 23, 2020, 7:54am

The links to the corresponding models are at the bottom.

Mte90 · December 28, 2020, 4:08pm

The Mozilla Italia community release the model for italian https://github.com/MozillaItalia/DeepSpeech-Italian-Model

Together with a new text corpus, and now we are working to create another audio+text dataset with an aggregation of a lot of mini datasets around the web.

lissyx · February 2, 2021, 1:13pm

Please avoid hijacking unrelated threads, and look at the documentation covering rebuilding a scorer.

Topic		Replies	Views
Collection of pretrained deepspeech models DeepSpeech dataset	8	10690	June 26, 2020
Using Deep Speech DeepSpeech	34	12796	August 20, 2019
DeepSpeech Training own English model for call center speech recognition DeepSpeech	22	3235	October 8, 2019
Train DeepSpeech with other language DeepSpeech	0	279	April 3, 2021
DeepSpeech Latest Results with English DeepSpeech	10	1299	July 14, 2019

Links to pretrained models

Related topics