Error when training model

kdavis · January 8, 2019, 7:20am

Can you give us access to chunk15.wav to test too?

karthikeyank · October 24, 2019, 2:15pm

sure… here it is… I have added two more audio files too…

chunks.zip (332.0 KB)

kdavis · January 8, 2019, 8:50am

Looking at chunk15.wav two things jump out at me:

The accent is Indian however, our model was trained for an American accent. The performance will be worse on an Indian accent and some fine-tuning of the model should be done.
I am getting different results than you are with 0.4.0 (“i can dig and people of minutes for a god in perfecting latin poem notepaper”) which suggests your 0.4.0 setup is amiss.

karthikeyank · January 8, 2019, 10:01am

I already tried fine tuning the DeepSpeech 0.3.0 model with this data, and it exported a model after three epochs. The exported model’s inference result was null.
Then I just stopped fine tuning assuming that fine tuning a model means training a new model for our data knowledge, and not adding new knowledge to the existing model (no knowledge can be carried back to the new model)…

I downloaded the pretrained DeepSpeech 0.4.0 model from github release and used it with the below command as usual.

~$ deepspeech --model models/output_graph.pbmm --audio /mnt/c/users/karthikeyan/downloads/chunk15.wav --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie

Is there any other ways to inference deepspeech to get the output like yours…!! Do i need to add any extra parameters or any other things…!!

kdavis · January 8, 2019, 9:32am

Let start at the start.

How did you install deepspeech 0.4.0? Did you install it in a fresh virtual environment as suggested in the documentation[1].

karthikeyank · January 8, 2019, 10:02am

I downloaded the pretrained DeepSpeech 0.4.0 models from the releases. using this,

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.4.0/deepspeech-0.4.0-models.tar.gz tar xvfz deepspeech-0.4.0-models.tar.gz…
and extracted in to a directory, created a virtual environment, install requirement.txt… started inference…

lissyx · January 8, 2019, 9:56am

@karthikeyank I’m sorry to insist again, but please, stop removing the versions from the output, it’s really important when we do that kind of debugging.

karthikeyank · January 8, 2019, 10:03am

Am sorry @lissyx, I don’t know, what you are talking about… I think i didn’t mentioned the version of the models at some places ( I have corrected them).

lissyx · January 8, 2019, 10:34am

When you run inference, there’s TensorFlow and DeepSpeech build versions written in the output. We need that, as well as the exact model files. There might be small divergences with high impact.

kdavis · January 8, 2019, 10:36am

@karthikeyank Could you use the instructions I referenced[1] to install the Python package? Just so we are starting at the same point. (In particular it mentions nothing about “install requirement.txt”.)

karthikeyank · January 8, 2019, 10:54am

Yes… I followed the Same method and the version of DeepSpeech Python package is DeepSpeech: v0.4.0-0-g48ad711…

I installed all the python packages mentioned in the requirements.txt in the virtual environment…

I tried again and It produces the following output now,

(deep4.0) userk@PSSHSRDT034:~/DeepSpeechPro/native_client4.0$ deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio /mnt/c/users/karthikeyan/Downloads/chunk15.wav
Loading model from file models/output_graph.pbmm
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.0-0-g48ad711
2019-01-08 16:21:49.623456: Itensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.042s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 0.269s.
Running inference.
i can do and picture of minutes for eating for the saintly in bloom no deterrent
Inference took 21.940s for 6.060s audio file.

karthikeyank · January 8, 2019, 10:55am

sure @lissyx… I will follow that here after…

kdavis · January 8, 2019, 11:03am

The instructions I reference do not mention requirement.txt. Could you please follow the instructions I reference or it’s hard/impossible for use to debug.

karthikeyank · January 8, 2019, 11:19am

okay now I have created a new virtual environment and installed deepspeech…
done…

and this is the output…

(newenv) userk@PSSHSRDT034:~/pycodes$ deepspeech --model /home/userk/DeepSpeechPro/native_client4.0/models/output_graph.pbmm --alphabet /home/userk/DeepSpeechPro/native_client4.0/models/alphabet.txt --lm /home/userk/DeepSpeechPro/native_client4.0/models/lm.binary --trie /home/userk/DeepSpeechPro/native_client4.0/ models/trie --audio /mnt/c/users/karthikeyan/Downloads/chunk15.wav
Loading model from file
/home/userk/DeepSpeechPro/native_client4.0/models/output_graph.pbmm
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.0-0-g48ad711
2019-01-08 16:45:21.728660: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.0149s.
Loading language model from files /home/userk/DeepSpeechPro/native_client4.0/models/lm.binary /home/userk/DeepSpeechPro/native_client4.0/models/trie
Loaded language model in 0.294s.
Running inference.
i can do and picture of minutes for eating for the saintly in bloom no deterrent
Inference took 3.389s for 6.060s audio file.

kdavis · January 8, 2019, 5:38pm

This is different from my result which seems odd. The entire process should be deterministic.

Could you also indicate how you created your virtual environment?

Also, as you have at least 2 models you’ve been using, our release model and your fine-tuned model, could you check to make sure that the model you are using is indeed the release model and not your fine tuned one?

For example you could download the release model again or compute the md5 hash of the model. On my machine I have:

kdavis-19htdh:models kdavis$ md5 output_graph.pbmm
MD5 (output_graph.pbmm) = a3cafcb87fcf09d38ce58cbc41e5c681

kdavis · January 8, 2019, 5:59pm

Similarly for the language model and trie

kdavis-19htdh:models kdavis$ md5 lm.binary
MD5 (lm.binary) = 5f762eecdc4c4cc2068dc1a84ec57873
kdavis-19htdh:models kdavis$ md5 trie
MD5 (trie) = 182f72835a19800a3564b5da75ffc526

karthikeyank · January 9, 2019, 4:29am

I created the Virtual environment by the following commands,

sudo apt-get install python3-venv

python -m venv newenv

Nope. Actually the fine tuned model was producing null values, So am not using it… I’m using DeepSpeech 0.3.0 and DeepSpeech 0.4.0…

The md5 hash of DeepSpeech 0.4.0 model is

a3cafcb87fcf09d38ce58cbc41e5c681  output_graph.pbmm

The md5 hash of language model is

5f762eecdc4c4cc2068dc1a84ec57873  lm.binary

The md5 hash of trie file is

182f72835a19800a3564b5da75ffc526  trie

kdavis · January 9, 2019, 5:10am

Are you sure you are using python3? What does

python --version

give?

karthikeyank · January 9, 2019, 5:12am

python --version

Python 3.5.2

kdavis · January 9, 2019, 8:04am

@lissyx What’s the text output you get for chunk15.wav?