Links to pretrained models

dan.bmh · April 6, 2021, 8:31am

The DeepSpeech-Polyglot project did receive a large update over the last weeks. It was reimplemented in tensorflow2 and new networks have been added. The recognition performance was greatly improved. It also got a new name: Scribosermo and now can be found here:

The new models can be trained very fast (~3 days on 2x1080Ti to reach SOTA in German) and with comparatively small datasets (~280h for competitive results in Spanish). Using a little bit more time and data, the following Word-Error-Rates on CommonVoice testset were achieved:

German	English	Spanish	French
7.2 %	3.7 %	10.0 %	11.7 %

Training custom models with Scribosermo is very simple, step by step instructions can be found in the readmes. Adding new languages is very easy, too. After training, the models can be exported into tflite-format for easier inference. They are able to run faster than real-time on a RaspberryPi-4.

The most important features are already implemented, but there is still some room left for optimizations. Feel free to improve it and send a merge request. And it would be great if you can publish your own models as well.

Note: Currently only inference with python is supported, the new models are not compatible with the DeepSpeech bindings anymore (the old models are still available). But technically it should be possible to integrate them again. If someone is interested in doing this, some notes can be found in this thread: Integration of DeepSpeech-Polyglot's new networks

ftyers · April 24, 2021, 6:33pm

I have trained models for most of the Common Voice languages. They are available here: https://tepozcatl.omnilingo.cc/v0.1.0/manifest.html

ftyers · April 27, 2021, 2:38pm

Update on the Basque model, I managed to train it a bit longer:

ftyers · April 29, 2021, 10:52pm

Pretrained models for Swahili (sw), Wolof (wo), Yoruba (yo) and Amharic (am):

enavarro · May 14, 2021, 11:17am

Hello.

I’m having trouble getting the models to run. I’m using windows, VS -C#, I get the example of .NET Framework running in english, but I can’t figure out the way to plug in the spanish model. I’ve spent hours reading up and down, in “getting-the-pre-trained-model” in readthedocs explains perfectly how to use a .pbmm model, but doesn’t mention about .pb, which is the kind I find in the mediafire downloads site.
So clearly I’m missing some important point, please point me in the right direction.

ftyers · May 14, 2021, 2:08pm

Hi @enavarro, this isn’t the right topic for that. You can convert the .pb model to .pbmm using convert_graphdef_memmapped_format. But if you’d like more support, please open another topic or join us on Mozilla’s Matrix.

dan.bmh · May 15, 2021, 7:12am

Hi, I think you using the wrong models. The new ones aren’t compatible with DeepSpeech anymore, the older models for DeepSpeech are linked later in the readme.

-------- Original-Nachricht --------

enavarro · May 17, 2021, 6:06am

Hello Daniel.
Could you please point me to where it explains this change of model type, and how I could use them? I guess there is some DeepSpeech 2 to use them, in which case I don’t want to use an older version.
Thank you!

dan.bmh · May 17, 2021, 7:10am

As written in a post above, Scribosermo’s new models only support usage with Python, you will need to add an additional interface to use them in C# or .NET.

And, citing from the first paragraph of the Readme: " You can find a short and experimental inference example here"

Steven_Mendez · May 17, 2021, 5:56pm

Hi, what version of DeepSpeech is required to use spanish model?

dan.bmh · May 18, 2021, 7:06am

They were trained with 0.9.X, but should work with any version between 0.7.X and the current 0.10.X, like the official English model.

Steven_Mendez · May 26, 2021, 2:31am

I’ve been trying use your pre-trained spanish model but when I run the example of the microphone of deepspeech and I call your model an error appears “rebuild TensorFlow with the appropriate compiler flags” I need to change the way how the example of the micropphone load the model? because I have used the pre-trained model of deepspeech .pbmm and it works
(Image of my error -> https://www.dropbox.com/s/tkuog2gu1hq1lc4/Capture.PNG?dl=0)

dan.bmh · May 26, 2021, 7:18am

You are using the wrong model. As written above and in the readme, the new model (ending on .pb) is not compatible with DeepSpeech anymore. You have to use the old model ending on .pbmm

Steven_Mendez · May 28, 2021, 12:05am

Oh thank you so much. However, if I want to use the last spanish model Quartznet15x5, D8CV (WER: 10.0%) I need to install Quartznet? because I’ve been reading but when I search on Internet how to install Quartznet, I don’t find anything only how to install NeMo, NeMo is the library to use Quartznet right?

dan.bmh · May 28, 2021, 7:08am

No, you just need to install tflite+dsctcdecoder. See the inference example which is linked in the first paragraph of the usage chapter:

You can find a short and experimental inference example here

lissyx · May 28, 2021, 7:38am

it is not an error, please read the message correctly, it just says your CPU supports more than what we have built the library with, it’s harmless.

Steven_Mendez · May 30, 2021, 5:39am

Oh thank you so much, I installed your examples and with model english worked but I tried with model spanish and when the audio is a little bigger, it shows me an error, hou could i do to load file bigger?
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

mitumitu · October 30, 2021, 10:32am

A very good tutorial
https://medium.com/@klintcho/creating-an-open-speech-recognition-dataset-for-almost-any-language-c532fb2bc0cf

dan.bmh · June 2, 2021, 10:15am

A longer audio file should only result in more memory usage. And from your error message it seems that the audio file might be broken or in wrong format.

LucieDevGirl · June 3, 2021, 9:44am

Hi,
Is this project still alive ? I don’t find any releases since December 2020 in french neither in english on these pages
I’m asking because I was not totally convinced by the quality of the model (especially in french) so I made a break of 4 months on this project and I’m surprised to see there is no new releases