Inference time for 0.9.3 is a lot more than 0.6.1

For the comparison I used the release acoustic model, lm, trie and scorer for both the versions.

Following is the command that I used for 0.9.3:

./deepspeech --model /storage/emulated/10/Android/data/com.visteon.sns.ww.app/files/sns/ww/output_graph.tflite --scorer /data/local/tmp/native_client_093.arm64.cpu.android/kenlm.scorer --beam_width 32 --lm_alpha 1.0545920026574804 --lm_beta 3.2744955478757265 -t --audio /data/local/tmp/sns_ww_cli_test_data/alexa/en-JP_155427693_alexa_2019-09-18T12:57:03.017Z.wav

Following is the command that I used for 0.6.1:
./deepspeech --model /data/local/tmp/native_client_061.arm64.cpu.android/output_graph.tflite --lm /data/local/tmp/native_client_061.arm64.cpu.android/lm.binary --trie /data/local/tmp/native_client_061.arm64.cpu.android/trie --lm_alpha 1.0545920026574804 --lm_beta 3.2744955478757265 --beam_width 32 -t --audio /data/local/tmp/sns_ww_cli_test_data/alexa/en-JP_155427693_alexa_2019-09-18T12:57:03.017Z.wav

Then I repeated the test without an LM/scorer for both the versions. The inference times for the same files, in seconds, are recorded in the table below.

audio file 0.6.1 native client with release model and lm 0.6.1 native client with release model and no lm 0.9.3 native client with release model and lm 0.9.3 native client with release model and no scorer
en-JP_155427693_alexa_2019-09-18T12:57:03.017Z.wav 1.50574 2.61644 6.24147 9.66468
global-GLOBAL_149392486_hey_siri_2020-04-22T043640.343Z.wav 1.60139 2.76046 6.3522 9.94327
global-GLOBAL_28819796_ok_google_2020-04-23T054509.144Z.wav 2.51439 1.70396 4.03806 6.30115
global-GLOBAL_221614382_visteon_2020-04-23T064454.172Z.wav 2.77821 1.19478 2.84475 3.59143

So you complain but you don’t provide:

  • audio lengths of each file
  • hardware you are using
  • repro steps

There’s nothing we can do in this context.

I see you are using the same values for LM alpha/beta on both, yet we document different values in the release notes, e.g., for v0.9.3:

    lm_alpha 0.931289039105002
    lm_beta 1.1834137581510284

The hardware that I am using has 2 A35 and 1 A72 cores with 2 GB DDR4 RAM and max frequency for each core is 1.2GHz.
I have used the default values of alpha and beta now:

./deepspeech --model /data/local/tmp/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.tflite --scorer /data/local/tmp/native_client_093.arm64.cpu.android/kenlm.scorer --beam_width 32 --lm_alpha 0.931289039105002 --lm_beta 1.1834137581510284 -t --audio /data/local/tmp/sns_ww_cli_test_data/visteon/global-GLOBAL_221614382_visteon_2020-04-23T064454.172Z.wav


./deepspeech --model /data/local/tmp/native_client_061.arm64.cpu.android/output_graph.tflite --lm /data/local/tmp/native_client_061.arm64.cpu.android/lm.binary --trie /data/local/tmp/native_client_061.arm64.cpu.android/trie --lm_alpha 0.75 --lm_beta 1.85 --beam_width 32 -t --audio /data/local/tmp/sns_ww_cli_test_data/visteon/global-GLOBAL_221614382_visteon_2020-04-23T064454.172Z.wav

Here are the updated results:

audio file name audio duration (sec) 0.6.1 native client with release model and lm release lm alpha and beta 0.6.1 native client with release model and no lm 0.9.3 native client with release model and scorer release lm alpha and beta 0.9.3 native client with release model and no scorer
en-JP_155427693_alexa_2019-09-18T12:57:03.017Z.wav 3 2.94852 2.61644 6.13521 9.66468
global-GLOBAL_149392486_hey_siri_2020-04-22T043640.343Z.wav 3 2.61169 2.76046 6.30586 9.94327
global-GLOBAL_28819796_ok_google_2020-04-23T054509.144Z.wav 2 1.66903 1.70396 3.92679 6.30115
global-GLOBAL_221614382_visteon_2020-04-23T064454.172Z.wav 1 1.24143 1.19478 2.99867 3.59143

The audio files are attached: upload.zip (173.4 KB)

Sorry but please reply with values, I don’t have time to play with your files.

big.LITTLE but you have only three CPUs ?

I’m sorry but the values are totally inconsistent: on the second set of results, your first file is now 2.94s while it was 1.5.

I’m not sure how you are measuring things, and I don’t have time to do that for you. Please properly and accurately measure runtime execution, profile with perf and verify if you are not just busting your CPUs ; chances are that since we set 4 threads on the TFLite runtime, maybe it’s too much for your system?

See https://github.com/mozilla/DeepSpeech/blob/7b2eeb6734a49c3d9439644e9fe0685cb46ad695/native_client/tflitemodelstate.cc#L184 and maybe hack that?

The audio duration is mentioned in the excel, I uploaded the files because you wanted a way to reproduce

This difference is because LM alpha and beta values have changed now since you asked me to use the release values of those parameters

I checked the code and you have set the same number of threads(4) for both 0.6.1 and 0.9.3 so that shouldn’t be a problem

I have rerun the tests using the audio files that you have provided in the release. First I ran it on the arm64 android target that I have used previously with the following commands:
./deepspeech --model /data/local/tmp/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.tflite --scorer /data/local/tmp/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.scorer -t --audio /data/local/tmp/audio-0.6.1/

./deepspeech --model /data/local/tmp/native_client_061.arm64.cpu.android/output_graph.tflite --lm /data/local/tmp/native_client_061.arm64.cpu.android/lm.binary --trie /data/local/tmp/native_client_061.arm64.cpu.android/trie -t --audio /data/local/tmp/audio-0.6.1/

audio file name audio duration (sec) 0.6.1 native client arm64 cpu android 0.6.1 native client arm64 cpu android no LM 0.9.3 native client arm64 cpu android 0.9.3 native client arm64 cpu android no scorer
2830-3980-0043.wav 2 2.40254 3.95979 4.71389 5.37027
4507-16021-0012.wav 3 3.85647 5.39578 6.39314 7.90841
8455-210777-0068.wav 3 3.96196 5.18108 6.25872 7.35326

Second, I ran it on my laptop (x86_64 linux) with the following commands:

deepspeech --model /home/rsandhu/deepspeech_v093/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.tflite --scorer /home/rsandhu/deepspeech_v093/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.scorer --audio /home/rsandhu/Downloads/audio-0.6.1/audio/

deepspeech --model /home/rsandhu/deepspeech_v061/native_client_061.arm64.cpu.android/output_graph.tflite --lm /home/rsandhu/deepspeech_v061/native_client_061.arm64.cpu.android/lm.binary --trie /home/rsandhu/deepspeech_v061/native_client_061.arm64.cpu.android/trie --audio /home/rsandhu/Downloads/audio-0.6.1/audio/

audio file name audio duration (sec) deepspeech 0.6.1 amd x86_64 linux deepspeech 0.6.1 amd x86_64 linux no LM deepspeech 0.9.3 amd x86_64 linux deepspeech 0.9.3 amd x86_64 linux no scorer
2830-3980-0043.wav 2 1.721 1.868 0.443 0.535
4507-16021-0012.wav 3 2.279 2.425 0.528 0.779
8455-210777-0068.wav 3 2.246 2.408 0.564 0.718

As you can see here, with 0.9.3 the inference time has increased for arm64 android but it has decreased for x86_64. I have used the same tflite model in both the cases.

Except it was not effective in 0.6

You still have not explained your exact process of how you repro the values. How many times for each runs? Is this the wall time? user time? Is that the mean value? What’s the stddev?

These results are by running each file just once. Even if I repeat it the difference is negligible, around 0.02 seconds. The inference time is what’s printed after the completion of inference by passing the -t option

Please use time, multiple runs, average and profile with perf.

@lissyx it must be something about the hardware capability. I tried it on a qualcomm 820 hardware and there the results were different. If I use 4 threads what you have the performance is pretty much the same but on rebuilding it by setting a single thread there is a 60% reduction in inference time. The results are below and were averaged over three runs

qualcomm
audio file name (en-US) audio duration (sec) 0.6.1 native client arm64 cpu android 0.9.3 native client arm64 cpu android 0.9.3 native client arm64 cpu android single thread
2830-3980-0043.wav 2 2.72773 2.29 1.16962
4507-16021-0012.wav 3 3.49367 2.55 1.24849
8455-210777-0068.wav 3 2.88018 2.6 1.15847

So, my question was what are these changes between 0.6.1 and 0.9.3 (other than multi-threading since I have disabled it) that have resulted in this improvement?

Please git log v0.6.1..v0.9.3, there were many changes.

qualcomm
audio file name (en-US) audio duration (sec) 0.6.1 native client arm64 cpu android 0.9.3 native client arm64 cpu android single thread
2830-3980-0043.wav 2 0.96025 1.16962
4507-16021-0012.wav 3 1.229356667 1.24849
8455-210777-0068.wav 3 1.18114 1.15847

The earlier results were wrong, somebody else did the test and he didn’t average. It turns out that the performance of 0.9.3 is similar to 0.6.1. For the release 0.9.3 library, on my previous hardware (arm64 cpu android, 2 A35 and 1 A72 cores with 2 GB DDR4 RAM and max frequency for each core 1.2GHz) I checked with the top command that the maximum CPU load is 150%, so it’s not going bust but the inference time increases because 4 thread have been set. I don’t know how to use ‘perf’ to profile.
However, on PC there is a lot of reduction in inference time in 0.9.3

PC
release audio file name audio duration (sec) deepspeech 0.6.1 amd x86_64 linux deepspeech 0.6.1 amd x86_64 linux no LM deepspeech 0.9.3 amd x86_64 linux deepspeech 0.9.3 amd x86_64 linux no scorer
2830-3980-0043.wav 2 1.721 1.868 0.443 0.535
4507-16021-0012.wav 3 2.279 2.425 0.528 0.779
8455-210777-0068.wav 3 2.246 2.408 0.564 0.718

So, how can I find the reason for not seeing performance improvement on arm64 cpu?

I’m sorry, but I really don’t have time for that ; performances depends on so many conditions especially on smartphone.

With only three cores, that’s expected.

Sorry, but this is the only reliable way to investigate.