Error when training model


(kdavis) #63

Can you give us access to chunk15.wav to test too?


(karthikeyan k) #64

sure… here it is… I have added two more audio files too…

chunks.zip (332.0 KB)


(kdavis) #65

Looking at chunk15.wav two things jump out at me:

  1. The accent is Indian however, our model was trained for an American accent. The performance will be worse on an Indian accent and some fine-tuning of the model should be done.
  2. I am getting different results than you are with 0.4.0 (“i can dig and people of minutes for a god in perfecting latin poem notepaper”) which suggests your 0.4.0 setup is amiss.

(karthikeyan k) #66

I already tried fine tuning the DeepSpeech 0.3.0 model with this data, and it exported a model after three epochs. The exported model’s inference result was null.
Then I just stopped fine tuning assuming that fine tuning a model means training a new model for our data knowledge, and not adding new knowledge to the existing model (no knowledge can be carried back to the new model)

I downloaded the pretrained DeepSpeech 0.4.0 model from github release and used it with the below command as usual.

~$ deepspeech --model models/output_graph.pbmm --audio /mnt/c/users/karthikeyan/downloads/chunk15.wav --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie

Is there any other ways to inference deepspeech to get the output like yours…!! Do i need to add any extra parameters or any other things…!!


(kdavis) #67

Let start at the start.

How did you install deepspeech 0.4.0? Did you install it in a fresh virtual environment as suggested in the documentation[1].


(karthikeyan k) #68

I downloaded the pretrained DeepSpeech 0.4.0 models from the releases. using this,

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.4.0/deepspeech-0.4.0-models.tar.gz tar xvfz deepspeech-0.4.0-models.tar.gz
and extracted in to a directory, created a virtual environment, install requirement.txt… started inference…


(Lissyx) #69

@karthikeyank I’m sorry to insist again, but please, stop removing the versions from the output, it’s really important when we do that kind of debugging.


(karthikeyan k) #70

Am sorry @lissyx, I don’t know, what you are talking about… I think i didn’t mentioned the version of the models at some places ( I have corrected them).


(Lissyx) #71

When you run inference, there’s TensorFlow and DeepSpeech build versions written in the output. We need that, as well as the exact model files. There might be small divergences with high impact.


(kdavis) #72

@karthikeyank Could you use the instructions I referenced[1] to install the Python package? Just so we are starting at the same point. (In particular it mentions nothing about “install requirement.txt”.)


(karthikeyan k) #73

Yes… I followed the Same method and the version of DeepSpeech Python package is DeepSpeech: v0.4.0-0-g48ad711

I installed all the python packages mentioned in the requirements.txt in the virtual environment…

I tried again and It produces the following output now,

(deep4.0) userk@PSSHSRDT034:~/DeepSpeechPro/native_client4.0$ deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio /mnt/c/users/karthikeyan/Downloads/chunk15.wav
Loading model from file models/output_graph.pbmm
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.0-0-g48ad711
2019-01-08 16:21:49.623456: Itensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.042s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 0.269s.
Running inference.
i can do and picture of minutes for eating for the saintly in bloom no deterrent
Inference took 21.940s for 6.060s audio file.


(karthikeyan k) #74

sure @lissyx… I will follow that here after…


(kdavis) #75

The instructions I reference do not mention requirement.txt. Could you please follow the instructions I reference or it’s hard/impossible for use to debug.


(karthikeyan k) #76

okay now I have created a new virtual environment and installed deepspeech…
done…

and this is the output…

(newenv) userk@PSSHSRDT034:~/pycodes$ deepspeech --model /home/userk/DeepSpeechPro/native_client4.0/models/output_graph.pbmm --alphabet /home/userk/DeepSpeechPro/native_client4.0/models/alphabet.txt --lm /home/userk/DeepSpeechPro/native_client4.0/models/lm.binary --trie /home/userk/DeepSpeechPro/native_client4.0/ models/trie --audio /mnt/c/users/karthikeyan/Downloads/chunk15.wav
Loading model from file
/home/userk/DeepSpeechPro/native_client4.0/models/output_graph.pbmm
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.0-0-g48ad711
2019-01-08 16:45:21.728660: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.0149s.
Loading language model from files /home/userk/DeepSpeechPro/native_client4.0/models/lm.binary /home/userk/DeepSpeechPro/native_client4.0/models/trie
Loaded language model in 0.294s.
Running inference.
i can do and picture of minutes for eating for the saintly in bloom no deterrent
Inference took 3.389s for 6.060s audio file.


(kdavis) #77

This is different from my result which seems odd. The entire process should be deterministic.

Could you also indicate how you created your virtual environment?

Also, as you have at least 2 models you’ve been using, our release model and your fine-tuned model, could you check to make sure that the model you are using is indeed the release model and not your fine tuned one?

For example you could download the release model again or compute the md5 hash of the model. On my machine I have:

kdavis-19htdh:models kdavis$ md5 output_graph.pbmm
MD5 (output_graph.pbmm) = a3cafcb87fcf09d38ce58cbc41e5c681


(kdavis) #78

Similarly for the language model and trie

kdavis-19htdh:models kdavis$ md5 lm.binary
MD5 (lm.binary) = 5f762eecdc4c4cc2068dc1a84ec57873
kdavis-19htdh:models kdavis$ md5 trie
MD5 (trie) = 182f72835a19800a3564b5da75ffc526


(karthikeyan k) #79

I created the Virtual environment by the following commands,

sudo apt-get install python3-venv

python -m venv newenv

Nope. Actually the fine tuned model was producing null values, So am not using it… I’m using DeepSpeech 0.3.0 and DeepSpeech 0.4.0

The md5 hash of DeepSpeech 0.4.0 model is

a3cafcb87fcf09d38ce58cbc41e5c681  output_graph.pbmm

The md5 hash of language model is

5f762eecdc4c4cc2068dc1a84ec57873  lm.binary

The md5 hash of trie file is

182f72835a19800a3564b5da75ffc526  trie

(kdavis) #80

Are you sure you are using python3? What does

python --version

give?


(karthikeyan k) #81

python --version

Python 3.5.2


(kdavis) #82

@lissyx What’s the text output you get for chunk15.wav?