The model inference carried out using the installed deepspeech and pre-trained model deepspeech-0.3.0-models.tar.gz gives a different result fron when deepspeech.py is run with the deepspeech-0.3.0-checkpoint.tar.gz checkpoint files and only model inference is done.
It might be because the deepspeech-0.3.0-checkpoint.tar.gz provided in the Releases page contains the files named model.v0.2.0. Please clarify the above.
Can you share more information on that ? v0.3.0 was not a version touching the model itself, it should have been a 0.2.1, but we did backward-incompatible changes to inference code and preferred to bump version.
DeepSpeech.py
by default uses a default beam width of 1024, while the clients use 512. It could be just that.
Are all the other model parameters same ? The exported model and the checkpoint files in the release 0.3.0 denote the same model configuration , and trained on the same datasets ?
Running the code from the checkpoint:
python3 -u DeepSpeech.py --checkpoint_dir ‘deepspeech-0.3.0-checkpoint’ --one_shot_infer ‘data/ldc93s1/LDC93S1.wav’ --train 0 --test 0
Inference : she had acuteness water
Running the installed package without language models:
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --audio ‘data/ldc93s1/LDC93S1.wav’
Inference : she hadered uc sut and greasy washwar or year
Running with language model
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio ‘data/ldc93s1/LDC93S1.wav’
Inference: she had ereducsutandgreasywashwaroryear
Please let me know about the above
I’ve already told you there was no change in training between v0.2.0 and v0.3.0
Thanks @lissyx for the information.
@reuben I executed the following command with the beam size 512 using the checkpoint files.
python3 -u DeepSpeech.py --checkpoint_dir ‘deepspeech-0.3.0-checkpoint’ --train 0 --test 0 --beam_width 512 --one_shot_infer ‘data/ldc93s1/LDC93S1.wav’
Inference: she had acuteness of ear
When I run following on the same audio file I am still getting a different output
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --audio ‘data/ldc93s1/LDC93S1.wav’
Inference: she hadered uc sut and greasy washwar or year
Am I missing something in the arguments for running the model from the checkpoint?
Looks like you’re using an incompatible model (output_graph.pbmm). If you’re using the v0.3 checkpoint/model, you should also use the v0.3 code/client.
As mentioned in the ReadMe I used the following command to get the model files, seems to be the same version as the checkpoint.
wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz | tar xvfz -
And you’re also using the native_client package from that same page?
I installed the DeepSpeech wheel via the following
pip3 install deepspeech
And DeepSpeech.py is also at v0.3.0?
I did a
git clone https://github.com/mozilla/DeepSpeech
Should I use the one v0.3.0 from the Releases page ?
Just do git checkout v0.3.0
Thanks a lot @reuben. It solved my problem !