GPU Inference on Jetson Nano

chrillemanden · March 12, 2020, 7:47am

Im currently doing a bachelor thesis where we are looking to deploy DeepSpeech on an NVIDIA Jetson Nano. We followed the following guide to build DeepSpeech 0.6.0: https://devtalk.nvidia.com/default/topic/1062327/jetson-nano/deepspeech-for-jetson-nano/. The author says he succesfully build DeepSpeech with CUDA support for the Jetson Nano.

On the other hand reading this article from 23rd of January 2020: https://www.hackster.io/dmitrywat/offline-speech-recognition-on-raspberry-pi-4-with-respeaker-c537e7, the author writes the following when comparing inference times on the Jetson Nano vs the Rasperry Pi 4 as the latter has a faster CPU:
“There are no pre-built binaries for arm64 architecture with GPU support as of this moment, so we cannot take advantage of Nvidia Jetson Nano’s GPU for inference acceleration. I don’t think this task is on DeepSpeech team roadmap, so in the near future I’ll do some research here myself and will try to compile that binary to see what speed gains can be achieved from using GPU.”

So Im a bit confused on whether or not deepspeech is able to use the GPU for inference on the Jetson Nano? I seem to recall answers on forum posts in here where it is suggested that the goal of deepspeech is optimising for inference on CPU’s anyway.

lissyx · March 12, 2020, 8:31am

Jetson Nano supports CUDA, we support CUDA. It’s just that we :

are a small team
have limited CI capacities
have limited usecase for GPU on those boards
don’t have those boards

So it should be possible to build on ARM64 with CUDA enabled, we just don’t provide prebuilt binaries.

I’ve repeated it myself several times, people who would like to contribute support for that are welcome. Cross-compiling is a bit painful with Bazel, but we document it and have it working on ARMv7 as well as ARM64, so it’s definitively possible.

chrillemanden · March 12, 2020, 9:17am

Ok
Appreciate the answer
I must admit, I don’t have any experience with cross-compiling, but would probably like to give it a go at some point

lissyx · March 12, 2020, 9:45am

It should not be super-complicated. We already have GCC ARM64 toolchain, so you can re-use that. Then I think it’s just a matter of properly setting up a sysroot tree (as we document using multistrap ) that includes CUDA, get inspiration from the current build:rpi3-armv8 in .bazelrc in tensorflow: tensorflow/.bazelrc at r1.15 · mozilla/tensorflow · GitHub

It might not be 100% straightforward because I think most people are afraid from doing that, but if you are reading the doc carefully and ask precise questions if needed, it’s 100% doable.

chrillemanden · March 12, 2020, 3:29pm

Ok, I will let you know if I ever get anything working

dkreutz · March 12, 2020, 6:48pm

I am the “author” of the Jetson Nano build mentioned by @chrillemanden in the initial post.

I found the performance of the Jetson Nano with GPU a bit underwhelming for DeepSpeech inference. There is still a lot of computation done on the CPU and probably copying data between CPU- and GPU-memory area adds too much overhead.
Maybe I did miss some optimization flags for the compiler, too.

lissyx · March 13, 2020, 9:38am

Cool! Any reason not to rely on cross-compilation ?

Our graph has some ops that don’t have GPU implementation anyway, so its not surprising. Have you had a chance to compare to a desktop GPU and see if there’s a real difference in which ops gets executed on the GPU ?

Well maybe it’d be worth adapting the --copt= to match your precise ARM core, but I have never seen this having a real influence in our context.

dkreutz · March 13, 2020, 8:27pm

Yes, my laziness I used to develop Java and Python web applications in my day job about 10 years ago, I rarely had to deal with all that C/C++ build chain stuff - and Bazel makes my head explode…

What I understand from the build process for Tensorflow and DeepSpeech is that support for Aarch64/Arm64 architecture is often bound to Android-OS - maybe it would help to untangle this?

My desktop at home is a (older) Mac - so unfortunately no CUDA/GPU

lissyx · March 13, 2020, 8:36pm

Understandable

Well, we cross-build for Linux/ARMv7 and Linux/Aarch64, with a few changes to TensorFlow to add cross-compiler, and a multistrap-generated sysroot.

dkreutz · August 31, 2020, 7:35pm

I have build DeepSpeech 0.8.2 with CUDA support for Nvidia Jetson/Xavier - any feedback welcome…

lissyx · September 1, 2020, 6:48am

Do you have patches to share for that?

dkreutz · September 1, 2020, 10:23pm

No patches. Built with:
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --copt=-march=armv8-a --copt=-mtune=cortex-a57 --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden --config=cuda --config=nonccl --config=noaws --config=nogcp --verbose_failures --config=nohdfs --config=v2 -—copt=-fPIC //native_client:libdeepspeech.so //native_client:generate_scorer_package

Python bindings was a bit hacky, you have to manually build SWIG 3.0.2 and create/replace the symlinks in DeepSpeech/native_client/ds-swig. Didn’t manage to build C++ bindings, but I don’t need them anyway…

lissyx · September 2, 2020, 8:35am

Right, you should though use the same SWIG version “just in case”, but it’s true we don’t have prebuilt versions for ARM64, so you need to build your own

It’s good to know it can work in-place. Would you like to help get that working through cross-compilation, if doable?

dkreutz · September 2, 2020, 8:43am

Editing Makefiles and hacking build processes comes next to visiting the dentist for me
I will think about it and come back to you with a PR in case i find a painless solution…

dkreutz · September 29, 2020, 12:37pm

Without further ado: DeepSpeech v0.9.0-alpha10 for Nvida Jetson / Xavier

dkreutz · November 3, 2020, 1:20pm

Release 0.9.0 is here: DeepSpeech v.0.9.0 for Jetson/Xavier

dkreutz · December 4, 2020, 3:30pm

And here is DeepSpeech v0.9.2 for Jetson/Xavier

dkreutz · January 26, 2021, 2:46pm

Here is the (belated) release of DeepSpeech v.0.9.3 for Jetson/Xavier (Python wheel only).

Topic		Replies	Views
DeepSpeech on Nvidia Jetson Nano? DeepSpeech	5	1410	December 6, 2019
Video and benchmarking results DeepSpeech	15	1607	February 6, 2020
Jetson TX1 deepspeech inference performance is not real time DeepSpeech	3	854	September 11, 2018
ARM native_client with GPU support DeepSpeech	54	5534	July 27, 2018
What is the latest version of CUDA and cuDNN that DeepSpeech will work with? DeepSpeech	9	1786	March 23, 2021

GPU Inference on Jetson Nano

Related topics