Links to pretrained models

The DeepSpeech-Polyglot project did receive a large update over the last weeks. It was reimplemented in tensorflow2 and new networks have been added. The recognition performance was greatly improved. It also got a new name: Scribosermo and now can be found here:

The new models can be trained very fast (~3 days on 2x1080Ti to reach SOTA in German) and with comparatively small datasets (~280h for competitive results in Spanish). Using a little bit more time and data, the following Word-Error-Rates on CommonVoice testset were achieved:

German English Spanish French
7.2 % 3.7 % 10.0 % 11.7 %

Training custom models with Scribosermo is very simple, step by step instructions can be found in the readmes. Adding new languages is very easy, too. After training, the models can be exported into tflite-format for easier inference. They are able to run faster than real-time on a RaspberryPi-4.

The most important features are already implemented, but there is still some room left for optimizations. Feel free to improve it and send a merge request. And it would be great if you can publish your own models as well.

Note: Currently only inference with python is supported, the new models are not compatible with the DeepSpeech bindings anymore (the old models are still available). But technically it should be possible to integrate them again. If someone is interested in doing this, some notes can be found in this thread: Integration of DeepSpeech-Polyglot's new networks

1 Like

I have trained models for most of the Common Voice languages. They are available here: https://tepozcatl.omnilingo.cc/v0.1.0/manifest.html

1 Like

Update on the Basque model, I managed to train it a bit longer:

Pretrained models for Swahili (sw), Wolof (wo), Yoruba (yo) and Amharic (am):

1 Like

Hello.

Iā€™m having trouble getting the models to run. Iā€™m using windows, VS -C#, I get the example of .NET Framework running in english, but I canā€™t figure out the way to plug in the spanish model. Iā€™ve spent hours reading up and down, in ā€œgetting-the-pre-trained-modelā€ in readthedocs explains perfectly how to use a .pbmm model, but doesnā€™t mention about .pb, which is the kind I find in the mediafire downloads site.
So clearly Iā€™m missing some important point, please point me in the right direction.

Hi @enavarro, this isnā€™t the right topic for that. You can convert the .pb model to .pbmm using convert_graphdef_memmapped_format. But if youā€™d like more support, please open another topic or join us on Mozillaā€™s Matrix.

Hi, I think you using the wrong models. The new ones arenā€™t compatible with DeepSpeech anymore, the older models for DeepSpeech are linked later in the readme.

-------- Original-Nachricht --------

Hello Daniel.
Could you please point me to where it explains this change of model type, and how I could use them? I guess there is some DeepSpeech 2 to use them, in which case I donā€™t want to use an older version.
Thank you!

As written in a post above, Scribosermoā€™s new models only support usage with Python, you will need to add an additional interface to use them in C# or .NET.

And, citing from the first paragraph of the Readme: " You can find a short and experimental inference example here"

Hi, what version of DeepSpeech is required to use spanish model?

They were trained with 0.9.X, but should work with any version between 0.7.X and the current 0.10.X, like the official English model.

Iā€™ve been trying use your pre-trained spanish model but when I run the example of the microphone of deepspeech and I call your model an error appears ā€œrebuild TensorFlow with the appropriate compiler flagsā€ I need to change the way how the example of the micropphone load the model? because I have used the pre-trained model of deepspeech .pbmm and it works
(Image of my error -> https://www.dropbox.com/s/tkuog2gu1hq1lc4/Capture.PNG?dl=0)

You are using the wrong model. As written above and in the readme, the new model (ending on .pb) is not compatible with DeepSpeech anymore. You have to use the old model ending on .pbmm

Oh thank you so much. However, if I want to use the last spanish model Quartznet15x5, D8CV (WER: 10.0%) I need to install Quartznet? because Iā€™ve been reading but when I search on Internet how to install Quartznet, I donā€™t find anything only how to install NeMo, NeMo is the library to use Quartznet right?

No, you just need to install tflite+dsctcdecoder. See the inference example which is linked in the first paragraph of the usage chapter:

You can find a short and experimental inference example here

it is not an error, please read the message correctly, it just says your CPU supports more than what we have built the library with, itā€™s harmless.

Oh thank you so much, I installed your examples and with model english worked but I tried with model spanish and when the audio is a little bigger, it shows me an error, hou could i do to load file bigger?
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

A very good tutorial
https://medium.com/@klintcho/creating-an-open-speech-recognition-dataset-for-almost-any-language-c532fb2bc0cf

A longer audio file should only result in more memory usage. And from your error message it seems that the audio file might be broken or in wrong format.

Hi,
Is this project still alive ? I donā€™t find any releases since December 2020 in french neither in english on these pages
Iā€™m asking because I was not totally convinced by the quality of the model (especially in french) so I made a break of 4 months on this project and Iā€™m surprised to see there is no new releases