Following is the command that I used for 0.6.1: ./deepspeech --model /data/local/tmp/native_client_061.arm64.cpu.android/output_graph.tflite --lm /data/local/tmp/native_client_061.arm64.cpu.android/lm.binary --trie /data/local/tmp/native_client_061.arm64.cpu.android/trie --lm_alpha 1.0545920026574804 --lm_beta 3.2744955478757265 --beam_width 32 -t --audio /data/local/tmp/sns_ww_cli_test_data/alexa/en-JP_155427693_alexa_2019-09-18T12:57:03.017Z.wav
Then I repeated the test without an LM/scorer for both the versions. The inference times for the same files, in seconds, are recorded in the table below.
audio file
0.6.1 native client with release model and lm
0.6.1 native client with release model and no lm
0.9.3 native client with release model and lm
0.9.3 native client with release model and no scorer
The hardware that I am using has 2 A35 and 1 A72 cores with 2 GB DDR4 RAM and max frequency for each core is 1.2GHz.
I have used the default values of alpha and beta now:
The audio files are attached: upload.zip (173.4 KB)
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
Sorry but please reply with values, I don’t have time to play with your files.
big.LITTLE but you have only three CPUs ?
I’m sorry but the values are totally inconsistent: on the second set of results, your first file is now 2.94s while it was 1.5.
I’m not sure how you are measuring things, and I don’t have time to do that for you. Please properly and accurately measure runtime execution, profile with perf and verify if you are not just busting your CPUs ; chances are that since we set 4 threads on the TFLite runtime, maybe it’s too much for your system?
I have rerun the tests using the audio files that you have provided in the release. First I ran it on the arm64 android target that I have used previously with the following commands: ./deepspeech --model /data/local/tmp/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.tflite --scorer /data/local/tmp/native_client_093.arm64.cpu.android/deepspeech-0.9.3-models.scorer -t --audio /data/local/tmp/audio-0.6.1/
As you can see here, with 0.9.3 the inference time has increased for arm64 android but it has decreased for x86_64. I have used the same tflite model in both the cases.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
8
Except it was not effective in 0.6
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
You still have not explained your exact process of how you repro the values. How many times for each runs? Is this the wall time? user time? Is that the mean value? What’s the stddev?
These results are by running each file just once. Even if I repeat it the difference is negligible, around 0.02 seconds. The inference time is what’s printed after the completion of inference by passing the -t option
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
11
Please use time, multiple runs, average and profile with perf.
@lissyx it must be something about the hardware capability. I tried it on a qualcomm 820 hardware and there the results were different. If I use 4 threads what you have the performance is pretty much the same but on rebuilding it by setting a single thread there is a 60% reduction in inference time. The results are below and were averaged over three runs
qualcomm
audio file name (en-US)
audio duration (sec)
0.6.1 native client arm64 cpu android
0.9.3 native client arm64 cpu android
0.9.3 native client arm64 cpu android single thread
2830-3980-0043.wav
2
2.72773
2.29
1.16962
4507-16021-0012.wav
3
3.49367
2.55
1.24849
8455-210777-0068.wav
3
2.88018
2.6
1.15847
So, my question was what are these changes between 0.6.1 and 0.9.3 (other than multi-threading since I have disabled it) that have resulted in this improvement?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
13
Please git log v0.6.1..v0.9.3, there were many changes.
0.9.3 native client arm64 cpu android single thread
2830-3980-0043.wav
2
0.96025
1.16962
4507-16021-0012.wav
3
1.229356667
1.24849
8455-210777-0068.wav
3
1.18114
1.15847
The earlier results were wrong, somebody else did the test and he didn’t average. It turns out that the performance of 0.9.3 is similar to 0.6.1. For the release 0.9.3 library, on my previous hardware (arm64 cpu android, 2 A35 and 1 A72 cores with 2 GB DDR4 RAM and max frequency for each core 1.2GHz) I checked with the top command that the maximum CPU load is 150%, so it’s not going bust but the inference time increases because 4 thread have been set. I don’t know how to use ‘perf’ to profile.
However, on PC there is a lot of reduction in inference time in 0.9.3
PC
release audio file name
audio duration (sec)
deepspeech 0.6.1 amd x86_64 linux
deepspeech 0.6.1 amd x86_64 linux no LM
deepspeech 0.9.3 amd x86_64 linux
deepspeech 0.9.3 amd x86_64 linux no scorer
2830-3980-0043.wav
2
1.721
1.868
0.443
0.535
4507-16021-0012.wav
3
2.279
2.425
0.528
0.779
8455-210777-0068.wav
3
2.246
2.408
0.564
0.718
So, how can I find the reason for not seeing performance improvement on arm64 cpu?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
15
I’m sorry, but I really don’t have time for that ; performances depends on so many conditions especially on smartphone.
With only three cores, that’s expected.
Sorry, but this is the only reliable way to investigate.