Links to pretrained models

Nice, it shows my project is not known enough: https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train
https://github.com/Common-Voice/commonvoice-fr/releases

Actually I did know about the project, but didn’t know you already trained and published a model, because I couldn’t read the french readmes …
And for some reason I also didn’t check the releases page :see_no_evil:

2 Likes

There is CONTRIBUTING.md as well and in English :[

You can find a Kabyle model here

5 Likes

Now you can also find models for Italian and Polish at DeepSpeech-Polyglot and I did retrain the French and Spanish models with the new CommonVoice release.

5 Likes

DeepSpeech-Polyglot got some updates in the last weeks:

  • Improved models for German, French and Spanish
  • Experimental support for training with wav2letter (but I didn’t achieve good results in my first tests)
  • You can now extract manual subtitles from YouTube playlists to generate more training data (check some videos before to ensure the text alignments are good)
4 Likes

@dan.bmh
Thank you for sharing.
You could provide the python command line to use polyglot with e.g. the Spanish model?

Not sure I’m understanding this right, you can’t “use” the models with that code. It’s only for training new models. Checkout the examples from Deepspeech’s repository and use the “.pbmm” and “.scorer” files from polyglot, if you want to use Spanish instead of English.

@dan.bmh
I see. Thank you for your great work.

@dan.bmh I could not find the “.pbmm” and “.scorer” files in the polyglot repo. From the Readme it seems that we have to do the training in order to create the models right? Is there anywhere we can directly download the .pbmm and .scorer files? I particularly need spanish.

The links to the corresponding models are at the bottom.

1 Like

The Mozilla Italia community release the model for italian https://github.com/MozillaItalia/DeepSpeech-Italian-Model

Together with a new text corpus, and now we are working to create another audio+text dataset with an aggregation of a lot of mini datasets around the web.

4 Likes

Please avoid hijacking unrelated threads, and look at the documentation covering rebuilding a scorer.

The DeepSpeech-Polyglot project did receive a large update over the last weeks. It was reimplemented in tensorflow2 and new networks have been added. The recognition performance was greatly improved. It also got a new name: Scribosermo and now can be found here:

The new models can be trained very fast (~3 days on 2x1080Ti to reach SOTA in German) and with comparatively small datasets (~280h for competitive results in Spanish). Using a little bit more time and data, the following Word-Error-Rates on CommonVoice testset were achieved:

German English Spanish French
7.2 % 3.7 % 10.0 % 11.7 %

Training custom models with Scribosermo is very simple, step by step instructions can be found in the readmes. Adding new languages is very easy, too. After training, the models can be exported into tflite-format for easier inference. They are able to run faster than real-time on a RaspberryPi-4.

The most important features are already implemented, but there is still some room left for optimizations. Feel free to improve it and send a merge request. And it would be great if you can publish your own models as well.

Note: Currently only inference with python is supported, the new models are not compatible with the DeepSpeech bindings anymore (the old models are still available). But technically it should be possible to integrate them again. If someone is interested in doing this, some notes can be found in this thread: Integration of DeepSpeech-Polyglot's new networks

1 Like

I have trained models for most of the Common Voice languages. They are available here: https://tepozcatl.omnilingo.cc/v0.1.0/manifest.html

1 Like

Update on the Basque model, I managed to train it a bit longer:

Pretrained models for Swahili (sw), Wolof (wo), Yoruba (yo) and Amharic (am):

1 Like

Hello.

I’m having trouble getting the models to run. I’m using windows, VS -C#, I get the example of .NET Framework running in english, but I can’t figure out the way to plug in the spanish model. I’ve spent hours reading up and down, in “getting-the-pre-trained-model” in readthedocs explains perfectly how to use a .pbmm model, but doesn’t mention about .pb, which is the kind I find in the mediafire downloads site.
So clearly I’m missing some important point, please point me in the right direction.

Hi @enavarro, this isn’t the right topic for that. You can convert the .pb model to .pbmm using convert_graphdef_memmapped_format. But if you’d like more support, please open another topic or join us on Mozilla’s Matrix.

Hi, I think you using the wrong models. The new ones aren’t compatible with DeepSpeech anymore, the older models for DeepSpeech are linked later in the readme.

-------- Original-Nachricht --------