DeepSpeech model training

You need to use things in sync, either all v0.5 or all v0.6

Hi @lissyx,

I am performing below mentioned instructions , using deepspeech version v0.5.1.

1- set-up git lfs
2- clone deepSpeech library git clone --branch v0.5.1 DeepSpeech-lib
3- install the dependencies -
pip3 install -r requirements.txt
4- install ds_ctcdecoder
pip3 install $(python3 util/ --decoder) , this installed ds-ctcdecoder==0.5.1
5. Download data-sets from official site.
6. convert data to a format that deepSPeech engine can understand -
bin/ …/data-sets/german/clips

  1. train using below command
    python3 --epochs 10 --checkpoint_dir /root/.local/share/deepspeech/checkpoints --nouse_seq_length --export_dir ./test/export/destination --train_files ./test/train.csv --dev_files ./test/dev.csv --test_files ./test/test.csv

    above command will output_graph.pb in the mentioned export dir i.e - ./test/export/destination

  2. Test with newly exported model
    python3 ./native_client/python/ --model ./test/export/destination/output_graph.pb --alphabet ./data/alphabet.txt --lm ./data/lm/lm.binary --trie ./data/lm/trie --audio …/Data-sets/german/clips/common_voice_de_17300571.wav

I am getting below error after step 8 , i.e trying to use newly trained model.

I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

and then –

I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

is it now related to my system configurations , please provide your inputs.


This is not an error and is not related to your models. Those are warnings, you can ignore them.

Hi @lissyx,

I made a copy paste error , sorry for that . Below is the actual error message i am getting -

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

While looking out on Net , i got below reason on one site -

Although the model has a Session and Graph, in some tensorflow methods, the default Session and Graph are used. To fix this I had to explicity say that I wanted to use both my Session and my Graph as the default:

but i am not getting this properly , Please let me know your inputs.


That feels strange, but you are running directly and you don’t share the start of the input, so we cannot check what is actually running.

Please test properly, as documented: set-up a virtualenv and install with pip install deepspeech==0.5.1 and run inference with deepspeech rather than calling directly.

Hi @lissyx,

I tried it running with interface as well and got the same error.

It works fine with pre-trained (model). i will try with creating new virtual environment .


Hi @lissyx,

I tried with creating new virtual environment , still facing same error.

Can it be because i have trained model with very few data-set (2-3 files of 10 sec).Currently i am trying to do a complete POC that’s why i have not trained with large data-set .Please let me know your inputs?


No, that’s something else.

Like …

and yes @laxmikant04.yadav you shared that earlier, but since you kept sharing without proper code formatting, your python command line was unreadable to me and thus I missed that information.

Thanks @lissyx ,

I went through your reply on post -
[FIXED] Error with master/alpha8 (unknown op: UnwrapDatasetVariant & WrapDatasetVariant)

so currently, i am training without --nouse_seq_length flag.

Those were simple steps i was making a note for myself on text file . I will keep in mind to have proper formatting on my next comments .


You don’t need to retrain, just re-export without that flag.

1 Like

Thanks @lissyx .

It worked fine after exporting without “–nouse_seq_length” flag.


1 Like

HI @lissyx,

I am working on speech recognition with microphone , and i started with below example from deepspeech github repo -

I could see it’s trying to recognise the speech but accuracy is not coming good for me .
i am working on Ubuntu 16.04 OS on a desktop .

Currently it’s only able to recignise one word that too when spoken very loud and very clear. and failing otherwise .

Could you please suggest what else i should try or where i can look up to increase it’s accuracy.

Our expectations are that it should be able to recognise simple sentances like - “Welcome to speech recognition” . this works perfectly when i try with clean audio files.


Looks like you’ve got some hint yourself. Though, you don’t document if those clean audio files are produced by you or if they are from other origin.


It looks like we have not updated that to 0.5.1, maybe it is worth testing if it improves, since this model was trained to be more robust to some noise.

Make sure your system is able to actually capture at mono 16kHz, resampling might get into.

It could also just be a side-effect of your mic, that captures poor quality sound. Besides improving the model, there’s hardly anything we can easily improve.

I’m wrong, 0.5.1 model was not released with noise robustness improvements. But the rest of my comment is valid.

Thanks @lissyx for your response.

the audio files i mentioned were created by me only usning online tool for text to speech conversion.

Audio files works fine for me , But currently i an working on live streaming via microphone and that is not giving proper accuracy as mentioned.

i will re-check on the mic audio quality .

And yes i noticed deepspeech version in requirment file and i updated it to - 0.5.1 for my local running . It was giving error if i try to execute with version -0.4.1 ,(i think may be because i have deepspeech interface of version 0.5.1)


So it’s not you speaking ?

Then maybe it is also a problem of accent.

yes… correct.

how could i mitigate it . Do i need to train the english model with my accent for example and with back-ground noice as well.

Wanted to check, is you and your team is going to releasing a noice robust model in near future for english.

Mostly, yes

This is something we are working on, but near future I can’t tell.

Maybe try some denoising library in front ?

1 Like

Can anyone enlighten me, I am stuck here Fatal Python Error: Segmentation fault.
I am using a vertual environment. and have run DeepSpeech using below .sh file.

This is my error log,

This is my .sh file