Inference time for 0.9.3 is a lot more than 0.6.1

For the comparison I used the release acoustic model, lm, trie and scorer for both the versions.

Following is the command that I used for 0.9.3:

./deepspeech --model /storage/emulated/10/Android/data/com.visteon.sns.ww.app/files/sns/ww/output_graph.tflite --scorer /data/local/tmp/native_client_093.arm64.cpu.android/kenlm.scorer --beam_width 32 --lm_alpha 1.0545920026574804 --lm_beta 3.2744955478757265 -t --audio /data/local/tmp/sns_ww_cli_test_data/alexa/en-JP_155427693_alexa_2019-09-18T12:57:03.017Z.wav

Following is the command that I used for 0.6.1:
./deepspeech --model /data/local/tmp/native_client_061.arm64.cpu.android/output_graph.tflite --lm /data/local/tmp/native_client_061.arm64.cpu.android/lm.binary --trie /data/local/tmp/native_client_061.arm64.cpu.android/trie --lm_alpha 1.0545920026574804 --lm_beta 3.2744955478757265 --beam_width 32 -t --audio /data/local/tmp/sns_ww_cli_test_data/alexa/en-JP_155427693_alexa_2019-09-18T12:57:03.017Z.wav

Then I repeated the test without an LM/scorer for both the versions. The inference times for the same files, in seconds, are recorded in the table below.

audio file 0.6.1 native client with release model and lm 0.6.1 native client with release model and no lm 0.9.3 native client with release model and lm 0.9.3 native client with release model and no scorer
en-JP_155427693_alexa_2019-09-18T12:57:03.017Z.wav 1.50574 2.61644 6.24147 9.66468
global-GLOBAL_149392486_hey_siri_2020-04-22T043640.343Z.wav 1.60139 2.76046 6.3522 9.94327
global-GLOBAL_28819796_ok_google_2020-04-23T054509.144Z.wav 2.51439 1.70396 4.03806 6.30115
global-GLOBAL_221614382_visteon_2020-04-23T064454.172Z.wav 2.77821 1.19478 2.84475 3.59143

So you complain but you don’t provide:

  • audio lengths of each file
  • hardware you are using
  • repro steps

There’s nothing we can do in this context.

I see you are using the same values for LM alpha/beta on both, yet we document different values in the release notes, e.g., for v0.9.3:

    lm_alpha 0.931289039105002
    lm_beta 1.1834137581510284

The hardware that I am using has 2 A35 and 1 A72 cores with 2 GB DDR4 RAM and max frequency for each core is 1.2GHz.
I have used the default values of alpha and beta now:

./deepspeech --model /data/local/tmp/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.tflite --scorer /data/local/tmp/native_client_093.arm64.cpu.android/kenlm.scorer --beam_width 32 --lm_alpha 0.931289039105002 --lm_beta 1.1834137581510284 -t --audio /data/local/tmp/sns_ww_cli_test_data/visteon/global-GLOBAL_221614382_visteon_2020-04-23T064454.172Z.wav


./deepspeech --model /data/local/tmp/native_client_061.arm64.cpu.android/output_graph.tflite --lm /data/local/tmp/native_client_061.arm64.cpu.android/lm.binary --trie /data/local/tmp/native_client_061.arm64.cpu.android/trie --lm_alpha 0.75 --lm_beta 1.85 --beam_width 32 -t --audio /data/local/tmp/sns_ww_cli_test_data/visteon/global-GLOBAL_221614382_visteon_2020-04-23T064454.172Z.wav

Here are the updated results:

audio file name audio duration (sec) 0.6.1 native client with release model and lm release lm alpha and beta 0.6.1 native client with release model and no lm 0.9.3 native client with release model and scorer release lm alpha and beta 0.9.3 native client with release model and no scorer
en-JP_155427693_alexa_2019-09-18T12:57:03.017Z.wav 3 2.94852 2.61644 6.13521 9.66468
global-GLOBAL_149392486_hey_siri_2020-04-22T043640.343Z.wav 3 2.61169 2.76046 6.30586 9.94327
global-GLOBAL_28819796_ok_google_2020-04-23T054509.144Z.wav 2 1.66903 1.70396 3.92679 6.30115
global-GLOBAL_221614382_visteon_2020-04-23T064454.172Z.wav 1 1.24143 1.19478 2.99867 3.59143

The audio files are attached: upload.zip (173.4 KB)

Sorry but please reply with values, I don’t have time to play with your files.

big.LITTLE but you have only three CPUs ?

I’m sorry but the values are totally inconsistent: on the second set of results, your first file is now 2.94s while it was 1.5.

I’m not sure how you are measuring things, and I don’t have time to do that for you. Please properly and accurately measure runtime execution, profile with perf and verify if you are not just busting your CPUs ; chances are that since we set 4 threads on the TFLite runtime, maybe it’s too much for your system?

See https://github.com/mozilla/DeepSpeech/blob/7b2eeb6734a49c3d9439644e9fe0685cb46ad695/native_client/tflitemodelstate.cc#L184 and maybe hack that?

The audio duration is mentioned in the excel, I uploaded the files because you wanted a way to reproduce

This difference is because LM alpha and beta values have changed now since you asked me to use the release values of those parameters

I checked the code and you have set the same number of threads(4) for both 0.6.1 and 0.9.3 so that shouldn’t be a problem

I have rerun the tests using the audio files that you have provided in the release. First I ran it on the arm64 android target that I have used previously with the following commands:
./deepspeech --model /data/local/tmp/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.tflite --scorer /data/local/tmp/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.scorer -t --audio /data/local/tmp/audio-0.6.1/

./deepspeech --model /data/local/tmp/native_client_061.arm64.cpu.android/output_graph.tflite --lm /data/local/tmp/native_client_061.arm64.cpu.android/lm.binary --trie /data/local/tmp/native_client_061.arm64.cpu.android/trie -t --audio /data/local/tmp/audio-0.6.1/

audio file name audio duration (sec) 0.6.1 native client arm64 cpu android 0.6.1 native client arm64 cpu android no LM 0.9.3 native client arm64 cpu android 0.9.3 native client arm64 cpu android no scorer
2830-3980-0043.wav 2 2.40254 3.95979 4.71389 5.37027
4507-16021-0012.wav 3 3.85647 5.39578 6.39314 7.90841
8455-210777-0068.wav 3 3.96196 5.18108 6.25872 7.35326

Second, I ran it on my laptop (x86_64 linux) with the following commands:

deepspeech --model /home/rsandhu/deepspeech_v093/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.tflite --scorer /home/rsandhu/deepspeech_v093/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.scorer --audio /home/rsandhu/Downloads/audio-0.6.1/audio/

deepspeech --model /home/rsandhu/deepspeech_v061/native_client_061.arm64.cpu.android/output_graph.tflite --lm /home/rsandhu/deepspeech_v061/native_client_061.arm64.cpu.android/lm.binary --trie /home/rsandhu/deepspeech_v061/native_client_061.arm64.cpu.android/trie --audio /home/rsandhu/Downloads/audio-0.6.1/audio/

audio file name audio duration (sec) deepspeech 0.6.1 amd x86_64 linux deepspeech 0.6.1 amd x86_64 linux no LM deepspeech 0.9.3 amd x86_64 linux deepspeech 0.9.3 amd x86_64 linux no scorer
2830-3980-0043.wav 2 1.721 1.868 0.443 0.535
4507-16021-0012.wav 3 2.279 2.425 0.528 0.779
8455-210777-0068.wav 3 2.246 2.408 0.564 0.718

As you can see here, with 0.9.3 the inference time has increased for arm64 android but it has decreased for x86_64. I have used the same tflite model in both the cases.

Except it was not effective in 0.6

You still have not explained your exact process of how you repro the values. How many times for each runs? Is this the wall time? user time? Is that the mean value? What’s the stddev?

These results are by running each file just once. Even if I repeat it the difference is negligible, around 0.02 seconds. The inference time is what’s printed after the completion of inference by passing the -t option

Please use time, multiple runs, average and profile with perf.