Output of deepspeech test epoch and deepspeech native client application is not same

rajpuneet.sandhu · July 17, 2020, 9:47pm

I am using the model and deepspeech version 0.6.1. The first part shows the result of the test epoch that is run at the end of training and the second shows the result of native client application. The language model and other parameters are the same in both the cases.

File 1:

WER: 1.000000, CER: 0.481481, loss: 27.261322

wav: file:///home/sranjeet/Documents/Speech_DataSet/internal_test_data/en-US/pcm_general_ckohls_american_accent/noagc/1583527502552_share_gas_stations_nearby.wav

src: “show me gas stations nearby”

res: "they gas stations and your by

deepspeech --model /home/rsandhu/sns/sns-app-android/app/src/main/assets/asr/en-US/output_graph.pbmm --lm /home/rsandhu/sns/sns-app-android/app/src/main/assets/asr/en-US/lm.binary --trie /home/rsandhu/sns/sns-app-android/app/src/main/assets/asr/en-US/trie --audio /home/rsandhu/Downloads/en-US_testing/pcm_general_ckohls_american_accent/noagc/1583527502552_share_gas_stations_nearby.wav

inference: surely guessed stations in your by

File 2:

WER: 1.000000, CER: 0.800000, loss: 27.743788

wav: file:///home/sranjeet/Documents/Speech_DataSet/internal_test_data/en-US/pcm_general_ckohls_american_accent/noagc/1583527528416_tenacity_see.wav

src: “turn off the ac”

res: “those”

deepspeech --model /home/rsandhu/sns/sns-app-android/app/src/main/assets/asr/en-US/output_graph.pbmm --lm /home/rsandhu/sns/sns-app-android/app/src/main/assets/asr/en-US/lm.binary --trie /home/rsandhu/sns/sns-app-android/app/src/main/assets/asr/en-US/trie --audio /home/rsandhu/Downloads/en-US_testing/pcm_general_ckohls_american_accent/noagc/1583527528416_tenacity_see.wav

inference: i also

lissyx · July 17, 2020, 10:14pm

are you sure about that? lm alpha, beta and beam width were handled separately on that version

rajpuneet.sandhu · July 20, 2020, 2:01pm

@lissyx I re-ran the test by specifying beam_width 1024, lm_alpha 0.75, lm_beta 1.85 and the results are still the same.

lissyx · July 20, 2020, 2:49pm

Well, sorry, but there’s nothing I can think of …

othiele · July 20, 2020, 3:32pm

I agree with lissyx, if the parameters were the same, you should get the same result. Double check that everything is really the same. To me it looks like you changed either the language model or the pbmm file.

You could check whether the wav-files have a different format from 16KHz PCM and the downsampling is different, but usually the same conversion is applied.

lissyx · July 20, 2020, 3:34pm

Also you are removing training command line, training log, stdout / stderr output of inference, so we wan’t cross-check what you do …