Problem with deepspeech-0.3.0-checkpoint.tar.gz


(Sanjana Sinha) #1

The model inference carried out using the installed deepspeech and pre-trained model deepspeech-0.3.0-models.tar.gz gives a different result fron when deepspeech.py is run with the deepspeech-0.3.0-checkpoint.tar.gz checkpoint files and only model inference is done.
It might be because the deepspeech-0.3.0-checkpoint.tar.gz provided in the Releases page contains the files named model.v0.2.0. Please clarify the above.


(Lissyx) #2

Can you share more information on that ? v0.3.0 was not a version touching the model itself, it should have been a 0.2.1, but we did backward-incompatible changes to inference code and preferred to bump version.


(Reuben Morais) #3

DeepSpeech.py by default uses a default beam width of 1024, while the clients use 512. It could be just that.


(Sanjana Sinha) #4

Are all the other model parameters same ? The exported model and the checkpoint files in the release 0.3.0 denote the same model configuration , and trained on the same datasets ?


(Sanjana Sinha) #5

Running the code from the checkpoint:

python3 -u DeepSpeech.py --checkpoint_dir ‘deepspeech-0.3.0-checkpoint’ --one_shot_infer ‘data/ldc93s1/LDC93S1.wav’ --train 0 --test 0

Inference : she had acuteness water

Running the installed package without language models:

deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --audio ‘data/ldc93s1/LDC93S1.wav’

Inference : she hadered uc sut and greasy washwar or year

Running with language model

deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio ‘data/ldc93s1/LDC93S1.wav’

Inference: she had ereducsutandgreasywashwaroryear


(Sanjana Sinha) #6

Please let me know about the above


(Lissyx) #7

I’ve already told you there was no change in training between v0.2.0 and v0.3.0


(Sanjana Sinha) #8

Thanks @lissyx for the information.


(Sanjana Sinha) #9

@reuben I executed the following command with the beam size 512 using the checkpoint files.

python3 -u DeepSpeech.py --checkpoint_dir ‘deepspeech-0.3.0-checkpoint’ --train 0 --test 0 --beam_width 512 --one_shot_infer ‘data/ldc93s1/LDC93S1.wav’
Inference: she had acuteness of ear

When I run following on the same audio file I am still getting a different output

deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --audio ‘data/ldc93s1/LDC93S1.wav’

Inference: she hadered uc sut and greasy washwar or year

Am I missing something in the arguments for running the model from the checkpoint?


(Reuben Morais) #10

Looks like you’re using an incompatible model (output_graph.pbmm). If you’re using the v0.3 checkpoint/model, you should also use the v0.3 code/client.


(Sanjana Sinha) #11

As mentioned in the ReadMe I used the following command to get the model files, seems to be the same version as the checkpoint.
wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz | tar xvfz -


(Reuben Morais) #12

And you’re also using the native_client package from that same page?


(Sanjana Sinha) #13

I installed the DeepSpeech wheel via the following

pip3 install deepspeech


(Reuben Morais) #14

And DeepSpeech.py is also at v0.3.0?


(Sanjana Sinha) #15

I did a
git clone https://github.com/mozilla/DeepSpeech

Should I use the one v0.3.0 from the Releases page ?


(Reuben Morais) #16

Just do git checkout v0.3.0


(Sanjana Sinha) #17

Thanks a lot @reuben. It solved my problem !