I stumbled upon DeepSpeech project a few weeks ago, when searching for a suitable ASR engine for my video and article about speech recognition on embedded devices.
I was really impressed with performance and speed of 0.6 version! So I decided to write and publish an article about benchmarking 0.6 DeepSpeech engine and creating a transcription demo for Raspberry Pi 4/Jetson Nano.
Here are the links to the article and the video. Hope it will bring more publicity to such an outstanding project!
(I already corrected the typo in the description )
While running tests on Nvidia Jetson Nano I used a pre-release wheel for arm64 architecture with tflite model inference enabled by default. The performance was slightly worse than on Raspberry Pi 4. I am interested to try running it on Jetson Nano with GPU acceleration - are there plans for releasing arm64 with GPU support? If not, I will try cross-compiling it myself, are there any caveats that I should know beforehand?
Thank you for such an amazing project!
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
It would be great to have more figures than just “slightly worse”, maybe there’s some trivial / actionable item here. But if I read correctly, it’s Cortex-A57, so close to the RPi3. It’d be interesting to know more. Re-training a simpler (n_hidden being lower 2048) might help a lot here, if you manage to keep good accuracy.
Please also ensure if you tested with the language model, this can make a difference: not having it will slow down things enough.
No, because this is non trivial work, especially to get CI covered, and we are trying to move away from GPUs.
It should work properly, I know @elpimous_robot successfully did it. Basically you should just follow the cross-compilation docs we have, but you might need specific tuning to ensure your sysroot directory includes CUDA-stuff. And obviously you need to adapt to use CUDA so --config=cuda.
Given we don’t have the hardware, it’ll be hard to help more.
There is comparison table in the article I used language model for tests on all platforms. 10 times average(first time discarded) for Raspberry Pi 4 (1 GB) is 1.6 seconds and for Jetson Nano is 2.3 seconds.
Yes, alas, I also don’t have Jetson Nano at hand now. Had to leave China for the time being because of the coronavirus outbreak
I’m curious, why are you moving away from GPUs now?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
For local inference, GPUs are far from being the most flexible solution: we are only limited to CUDA, it’s not an efficient usage of the power we have, it adds complex dependencies. Basically we are wondering about stopping usage of plain TensorFlow runtime for any local inference and just move everything to TFLite there. GPU / TensorFlow full-blown would still make sense in some other use-cases, but the benefits of switching all local inference usages to TFLite brings a lot of joy in me.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
Well, sorry, but you never mentionned that, and it’s shared as image, so it’s not accessible / not searchable easily.
How much audio is that ? The difference is not that huge, it seems the Jetson Nano is more efficient than the RPi3.
I prepared 2 different environments (docker) to compare the performance between deepspeech
with tensorflow (3.6 someone release Releases · domcross/DeepSpeech-for-Jetson-Nano · GitHub)
and
with tensorflow lite(python3.7, you provided)
For python3.6
As someone released the deepspeech 6.0 for Arm64 in Releases · domcross/DeepSpeech-for-Jetson-Nano · GitHub
1.I Downloaded the DeepSpeech-0.6.0 wheel from this release, then pip3.6 install deepspeech-0.6.0-cp36-cp36m-linux_aarch64.whl
2. Download the libdeepspeech.so file as well and put it in your search path.
Did I miss any step?
And please kindly tell the fuction of libdeepspeech.so #if I only used python, do I need libdeepspeech.so also?
In your 7.0 release page there is no specific version of libdeepspeech.so for Arm64
Does that mean there is no need to update the libdeepspeech.so?
Sorry for the long post
And thank you in advance for your patience and help.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
10
So you are comparing using someone else’s release ?
You can’t seriously rely on that for measuring memory usage.
What libdeepspeech.so are you refering to?
This is useless, we have ARM64 TensorFlow builds for 0.6.1 as well. Again, you can’t compare seriously using random sources like that.
Those figures holds. They were measured on desktop builds, AMD64, through analyzing of memory allocation using valgrind massif. Please replicate accordingly to verify correctly.
THere is no “7.0”, there are only some 0.7.0 alpha builds, and they have ARM64 builds …
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
11
In my understand python binding(.whl file?) is like a wrapper to call the C compiled library->libdeepspeech.so. Is it unnecessary for merely python usage?
I can’t find libdeepspeech.so in your release page for Arm64.
Sorry for the typo ,I did used the 0.7.0 alpha1 version.
Thank you for your time.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
13
Please don’t use 0.6.0 on TFLite, there was a bug.
I insist, we have 0.6.1 and 0.7.0 alpha that is compatible with 0.6.1 model, and 0.6.1 on ARM64 uses the TensorFlow runtime when the 0.7.0 alpha uses the TFLite runtime. There is no need to use any third-party that we don’t know what they did.
There’s no magic, it’s all visible in the git tree: it’s using the libdeepspeech.so. I don’t get the second part of your question.
What I see, though, is that you mix stuff, and install at several places magic libraries. In this context, it’s really messy to know for sure what you are running.
Make an effort ? I linked it to you. The Python wheel packages the library, there’s no need for magic stuff.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
14
Oh I didn’t know that, I thought .so and whl files are totally independent.
As in the build instruction the .so file and whl files are built and installed separately.
And I thought by merely doing pip3 install .whl would not be enough to run the vad sample.(I thought I also need to find/build a .so file for 0.7.0 and include it into the run path.
Sorry I don’t understand this part, I get the python runtime in the same__? Does this mean because both the 0.6.1 and 0.7.0 DeepSpeech are in python runtime thus the comparison result will be biased, even if they are to be run in two different docker container?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
16
This is floss, you can just verify instead of thinking. I know it might non trivial sometimes, but I’m always sad to see that people don’t check when they have everything they can, and keep a wrong idea in mind.
Yes, because they involves different tooling and constraints.
And yet we document everything. What is unclear please, so we can improve ?
If you valgrind --tool=massif on the Python inference code, you will also measure the python memory footprint. Which might give you different values in term of usage wrt libdeepspeech.so itself.
It should be the same biais between both Python runtimes, but since it’s an interpreted language it’s much harder to be certain. And more critically, it will be biaised against what you highlighted in the hacks.mozilla.org blog post.
I assume you want to see / measure the improvement we did, so it’s only fair and meaningful if you compare as we did.