Problem with deepspeech-0.3.0-checkpoint.tar.gz

(Sanjana Sinha) #1

The model inference carried out using the installed deepspeech and pre-trained model deepspeech-0.3.0-models.tar.gz gives a different result fron when is run with the deepspeech-0.3.0-checkpoint.tar.gz checkpoint files and only model inference is done.
It might be because the deepspeech-0.3.0-checkpoint.tar.gz provided in the Releases page contains the files named model.v0.2.0. Please clarify the above.

(Lissyx) #2

Can you share more information on that ? v0.3.0 was not a version touching the model itself, it should have been a 0.2.1, but we did backward-incompatible changes to inference code and preferred to bump version.

(Reuben Morais) #3 by default uses a default beam width of 1024, while the clients use 512. It could be just that.

(Sanjana Sinha) #4

Are all the other model parameters same ? The exported model and the checkpoint files in the release 0.3.0 denote the same model configuration , and trained on the same datasets ?

(Sanjana Sinha) #5

Running the code from the checkpoint:

python3 -u --checkpoint_dir ‘deepspeech-0.3.0-checkpoint’ --one_shot_infer ‘data/ldc93s1/LDC93S1.wav’ --train 0 --test 0

Inference : she had acuteness water

Running the installed package without language models:

deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --audio ‘data/ldc93s1/LDC93S1.wav’

Inference : she hadered uc sut and greasy washwar or year

Running with language model

deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio ‘data/ldc93s1/LDC93S1.wav’

Inference: she had ereducsutandgreasywashwaroryear

(Sanjana Sinha) #6

Please let me know about the above

(Lissyx) #7

I’ve already told you there was no change in training between v0.2.0 and v0.3.0

(Sanjana Sinha) #8

Thanks @lissyx for the information.

(Sanjana Sinha) #9

@reuben I executed the following command with the beam size 512 using the checkpoint files.

python3 -u --checkpoint_dir ‘deepspeech-0.3.0-checkpoint’ --train 0 --test 0 --beam_width 512 --one_shot_infer ‘data/ldc93s1/LDC93S1.wav’
Inference: she had acuteness of ear

When I run following on the same audio file I am still getting a different output

deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --audio ‘data/ldc93s1/LDC93S1.wav’

Inference: she hadered uc sut and greasy washwar or year

Am I missing something in the arguments for running the model from the checkpoint?

(Reuben Morais) #10

Looks like you’re using an incompatible model (output_graph.pbmm). If you’re using the v0.3 checkpoint/model, you should also use the v0.3 code/client.

(Sanjana Sinha) #11

As mentioned in the ReadMe I used the following command to get the model files, seems to be the same version as the checkpoint.
wget -O - | tar xvfz -

(Reuben Morais) #12

And you’re also using the native_client package from that same page?

(Sanjana Sinha) #13

I installed the DeepSpeech wheel via the following

pip3 install deepspeech

(Reuben Morais) #14

And is also at v0.3.0?

(Sanjana Sinha) #15

I did a
git clone

Should I use the one v0.3.0 from the Releases page ?

(Reuben Morais) #16

Just do git checkout v0.3.0

(Sanjana Sinha) #17

Thanks a lot @reuben. It solved my problem !