Links to pretrained models

Steven_Mendez · May 17, 2021, 5:56pm

Hi, what version of DeepSpeech is required to use spanish model?

dan.bmh · May 18, 2021, 7:06am

They were trained with 0.9.X, but should work with any version between 0.7.X and the current 0.10.X, like the official English model.

Steven_Mendez · May 26, 2021, 2:31am

I’ve been trying use your pre-trained spanish model but when I run the example of the microphone of deepspeech and I call your model an error appears “rebuild TensorFlow with the appropriate compiler flags” I need to change the way how the example of the micropphone load the model? because I have used the pre-trained model of deepspeech .pbmm and it works
(Image of my error -> https://www.dropbox.com/s/tkuog2gu1hq1lc4/Capture.PNG?dl=0)

dan.bmh · May 26, 2021, 7:18am

You are using the wrong model. As written above and in the readme, the new model (ending on .pb) is not compatible with DeepSpeech anymore. You have to use the old model ending on .pbmm

Steven_Mendez · May 28, 2021, 12:05am

Oh thank you so much. However, if I want to use the last spanish model Quartznet15x5, D8CV (WER: 10.0%) I need to install Quartznet? because I’ve been reading but when I search on Internet how to install Quartznet, I don’t find anything only how to install NeMo, NeMo is the library to use Quartznet right?

dan.bmh · May 28, 2021, 7:08am

No, you just need to install tflite+dsctcdecoder. See the inference example which is linked in the first paragraph of the usage chapter:

You can find a short and experimental inference example here

lissyx · May 28, 2021, 7:38am

it is not an error, please read the message correctly, it just says your CPU supports more than what we have built the library with, it’s harmless.

Steven_Mendez · May 30, 2021, 5:39am

Oh thank you so much, I installed your examples and with model english worked but I tried with model spanish and when the audio is a little bigger, it shows me an error, hou could i do to load file bigger?
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

mitumitu · October 30, 2021, 10:32am

A very good tutorial
https://medium.com/@klintcho/creating-an-open-speech-recognition-dataset-for-almost-any-language-c532fb2bc0cf

dan.bmh · June 2, 2021, 10:15am

A longer audio file should only result in more memory usage. And from your error message it seems that the audio file might be broken or in wrong format.

LucieDevGirl · June 3, 2021, 9:44am

Hi,
Is this project still alive ? I don’t find any releases since December 2020 in french neither in english on these pages
I’m asking because I was not totally convinced by the quality of the model (especially in french) so I made a break of 4 months on this project and I’m surprised to see there is no new releases

lissyx · June 3, 2021, 11:35am

Contributions are welcome, I have asked for help on the french model a lot of times. I am not working anymore on that at all, so I can only work on the french model on my spare time. And my spare time is negative for months.

ftyers · June 3, 2021, 11:46am

Dear @LucieDevGirl, first of all thanks for your interest in DeepSpeech! First of all, the project certainly isn’t dead. In fact very soon there will be announced a grants programme for DeepSpeech. In terms of the different models, there has been no release since December, but a lot of people have been working on models for different languages. If you’re interested we’d be happy to discuss more about how you can participate on Mozilla’s Matrix. I haven’t been working on French because it is already a very well resourced language, but I’d be happy to help out with support etc.

LucieDevGirl · June 3, 2021, 1:42pm

How can I contribute more that giving data on common-voice ? I would be happy to contribute to a french model more efficient !

ftyers · June 3, 2021, 5:04pm

Let’s talk on Matrix!

Steven_Mendez · June 12, 2021, 1:37am

I think the problem is the memory because when the audio is short I can load the file

Steven_Mendez · June 12, 2021, 9:18pm

if i want to use your spanish model, I only need to put the path of the model in checkpoint_file right? but I have a question in the code testing_tflite.py the acoustic model is in enlgish, do i need an acoustic model for spanish? or I only need the language model in spanish? because I’ve understood that the acoustic model is like the pronunciation and the language model is the grammar but the pronunciation in spanish is different that english

dan.bmh · June 14, 2021, 8:52am

Yes, you need both models

shasheene · September 9, 2021, 8:34am

Has anybody found a Japanese language model?

Maybe it can be trained on this 2000 hour corpus: https://github.com/laboroai/LaboroTVSpeech

The Mozilla Common Voice corpus for Japanese is very small (26 hours, 639 MB).

sekio · October 22, 2021, 7:59am

Hello @ftyers is this possible to update your link ? Im interrested to test your wolof model if possible