Links to pretrained models

In this topic we want to collect links to pretrained models and checkpoints in various languages.

4 Likes

You can find a german model with checkpoints here:
(Moved to DeepSpeech-Polyglot, see below)


and here:
3 Likes

A welsh model is here:

3 Likes

You can now find my german model, as well as models for french and spanish here:

1 Like

Just a reminder for @lissyx that you wanted to pin this thread:)

1 Like

Thanks for the reminder. For more context, we want to find a better solution to that problem, so hopefully we should have something online sooner rather than later (with proper metadata, licensing, versions informations, etc). In the meantime, let’s hope it helps people find resources :slight_smile:

Nice, it shows my project is not known enough: https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train
https://github.com/Common-Voice/commonvoice-fr/releases

Actually I did know about the project, but didn’t know you already trained and published a model, because I couldn’t read the french readmes …
And for some reason I also didn’t check the releases page :see_no_evil:

2 Likes

There is CONTRIBUTING.md as well and in English :[

You can find a Kabyle model here

5 Likes

Now you can also find models for Italian and Polish at DeepSpeech-Polyglot and I did retrain the French and Spanish models with the new CommonVoice release.

5 Likes

DeepSpeech-Polyglot got some updates in the last weeks:

  • Improved models for German, French and Spanish
  • Experimental support for training with wav2letter (but I didn’t achieve good results in my first tests)
  • You can now extract manual subtitles from YouTube playlists to generate more training data (check some videos before to ensure the text alignments are good)
4 Likes

@dan.bmh
Thank you for sharing.
You could provide the python command line to use polyglot with e.g. the Spanish model?

Not sure I’m understanding this right, you can’t “use” the models with that code. It’s only for training new models. Checkout the examples from Deepspeech’s repository and use the “.pbmm” and “.scorer” files from polyglot, if you want to use Spanish instead of English.

@dan.bmh
I see. Thank you for your great work.

@dan.bmh I could not find the “.pbmm” and “.scorer” files in the polyglot repo. From the Readme it seems that we have to do the training in order to create the models right? Is there anywhere we can directly download the .pbmm and .scorer files? I particularly need spanish.

The links to the corresponding models are at the bottom.

1 Like

The Mozilla Italia community release the model for italian https://github.com/MozillaItalia/DeepSpeech-Italian-Model

Together with a new text corpus, and now we are working to create another audio+text dataset with an aggregation of a lot of mini datasets around the web.

4 Likes

Please avoid hijacking unrelated threads, and look at the documentation covering rebuilding a scorer.