(Help) Building from source (for Jetson TX2) with cuda support

Hello everyone,

First of all, I am a layperson on this subject, so first of all sorry for any obvious mistakes made.

I’m trying to cross-compile the deepspeech binaries for the Jetson TX2 with cuda support.
DeepSpeech version used is 0.8.0 following this documentation.

My setup is:
OS: Ubuntu 18.04 bionic
Python version: 3.7
GCC version: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Bazel version: 2.0.0
CUDA: 10.1
CUDNN: 7.6


Building libdeespeech.so was successful using:
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --verbose_failures --config=cuda --config=rpi3-armv8 --config=rpi3-armv8_opt -c opt --copt=-O3 --copt=-fvisibility=hidden //native_client:libdeepspeech.so


Using multistrap to create a system tree:
multistrap -d multistrap-raspbian64-buster -f native_client/multistrap_armbian64_buster.conf

I’ve had to change noauth=false in the .conf or I would get NO_PUBKEY error messages.


Building deepspeech binary:
make TARGET=rpi3-armv8 deepspeech
Seemed to be successful.


Now to the problem. Trying to build the python bindings I get the following error message.

make TARGET=rpi3-armv8 bindings

SWIG_LIB="/media/alexander/LinuxFS/Projects/DSSource/STT/native_client/ds-swig/share/swig/4.0.2/" PATH="/media/alexander/LinuxFS/Projects/DSSource/STT/native_client/ds-swig/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin" swig -version

SWIG Version 4.0.2

Compiled with g++ [x86_64-unknown-linux-gnu]

Configured options: +pcre

Please see http://www.swig.org for reporting bugs and further information
SWIG_LIB="/media/alexander/LinuxFS/Projects/DSSource/STT/native_client/ds-swig/share/swig/4.0.2/" PATH="/media/alexander/LinuxFS/Projects/DSSource/STT/native_client/ds-swig/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin" swig -swiglib
/media/alexander/LinuxFS/Projects/DSSource/STT/native_client/ds-swig/share/swig/4.0.2/
pip install --quiet wheel==0.33.6 setuptools==39.1.0
DISTUTILS_USE_SDK=1 PATH=/media/alexander/LinuxFS/Projects/DSSource/STT/tensorflow/bazel-tensorflow/external/LinaroAarch64Gcc72/bin/aarch64-linux-gnu-:/media/alexander/LinuxFS/Projects/DSSource/STT/native_client/ds-swig/bin:$PATH SWIG_LIB="/media/alexander/LinuxFS/Projects/DSSource/STT/native_client/ds-swig/share/swig/4.0.2/" AS=/media/alexander/LinuxFS/Projects/DSSource/STT/tensorflow/bazel-tensorflow/external/LinaroAarch64Gcc72/bin/aarch64-linux-gnu-as CC=/media/alexander/LinuxFS/Projects/DSSource/STT/tensorflow/bazel-tensorflow/external/LinaroAarch64Gcc72/bin/aarch64-linux-gnu-gcc CXX=/media/alexander/LinuxFS/Projects/DSSource/STT/tensorflow/bazel-tensorflow/external/LinaroAarch64Gcc72/bin/aarch64-linux-gnu-c++ LD=/media/alexander/LinuxFS/Projects/DSSource/STT/tensorflow/bazel-tensorflow/external/LinaroAarch64Gcc72/bin/aarch64-linux-gnu-ld CFLAGS="-march=armv8-a -mtune=cortex-a53 -D_GLIBCXX_USE_CXX11_ABI=0 --sysroot /media/alexander/LinuxFS/Projects/DSSource/STT/multistrap-raspbian64-buster " LDFLAGS="-Wl,–no-as-needed ‘-Wl,-rpath,$ORIGIN/lib/’ -Wl,-rpath,$ORIGIN" MODEL_LDFLAGS="-L/media/alexander/LinuxFS/Projects/DSSource/STT/tensorflow/bazel-bin/native_client " MODEL_LIBS="-ldeepspeech " PYTHONPATH=/media/alexander/LinuxFS/Projects/DSSource/STT/multistrap-raspbian64-buster/usr/lib/python3.6/:/media/alexander/LinuxFS/Projects/DSSource/STT/multistrap-raspbian64-buster/usr/lib/python3/dist-packages/ _PYTHON_SYSCONFIGDATA_NAME=_sysconfigdata_m_linux_aarch64-linux-gnu NUMPY_INCLUDE=/media/alexander/LinuxFS/Projects/DSSource/STT/multistrap-raspbian64-buster/usr/include/python3.7/ python ./setup.py build_ext --plat-name linux_aarch64
Failed to import the site module
Traceback (most recent call last):
File “/usr/lib/python3.6/site.py”, line 570, in
main()
File “/usr/lib/python3.6/site.py”, line 556, in main
known_paths = addusersitepackages(known_paths)
File “/usr/lib/python3.6/site.py”, line 288, in addusersitepackages
user_site = getusersitepackages()
File “/usr/lib/python3.6/site.py”, line 264, in getusersitepackages
user_base = getuserbase() # this will also set USER_BASE
File “/usr/lib/python3.6/site.py”, line 254, in getuserbase
USER_BASE = get_config_var(‘userbase’)
File “/usr/lib/python3.6/sysconfig.py”, line 607, in get_config_var
return get_config_vars().get(name)
File “/usr/lib/python3.6/sysconfig.py”, line 550, in get_config_vars
_init_posix(_CONFIG_VARS)
File “/usr/lib/python3.6/sysconfig.py”, line 421, in _init_posix
_temp = import(name, globals(), locals(), [‘build_time_vars’], 0)
ModuleNotFoundError: No module named ‘_sysconfigdata_m_linux_aarch64-linux-gnu’
Makefile:12: recipe for target ‘bindings-build’ failed
make: *** [bindings-build] Error 1

ModuleNotFoundError: No module named ‘_sysconfigdata_m_linux_aarch64-linux-gnu’

Any help would be appreciated!

Best regards.
Alex

It means you lack the python arch specific things, and this is weird because it should be okay with the multistrap you setup.

Do you still have the issue if you force RASPBIAN= with your path to your multistrap?

You could just get the key and setup apt, but this is version-dependant and I don’t remember the exact steps.

Can you make sure you have correct python dev package installed?

1 Like

Specifically, it should come from https://packages.debian.org/buster/arm64/libpython3.7-minimal/filelist

Also, some of the magic resides here:


It’s not impossible you might have to verify and check everything around there to ensure it tries to load the proper modules that provides what you need.

Hey lissyx, thanks for the quick reply and your help! Really appreciate it.

Yielded the same results.

I think this was the problem. Checking this Issue it states the same problem which is supposedly fixed in Python 3.6.10, mine linked to 3.6.9… So I’m a bit confused here. Should it use the one in the multistrap or the local for building?


I’ve made changes to definitions.mk manually setting version to 3.7

PYTHONPATH=$(RASPBIAN)/usr/lib/python3.7/:$(RASPBIAN)/usr/lib/python3/dist-packages/

and using python3.7 in Makefile

... $(PYTHON_PATH) $(PYTHON_SYSCONFIGDATA) $(NUMPY_INCLUDE) python3.7 ./setup.py build_ext $(PYTHON_PLATFORM_NAME)

After this the build process seemed to work but I’ve noticed that the libdeepspeech.so was missing.
Checking the output I’ve noticed the following error message:
/bin/sh: 1: /media/alexander/LinuxFS/Projects/DSSource/STT/tensorflow/bazel-tensorflow/external/LinaroAarch64Gcc72/bin/aarch64-linux-gnu-ldd: not found
Following your steps from here, solved the problem and the whl compiled with the libdeepspeech.so included.


So far so good.
Copied the whl onto the Jetson. Installation was successful using
python3.7 -m pip install deepspeech-0.8.0-cp37-cp37m-linux_aarch64.whl inside a virtualenv.

Running deepspeech resulted in the error ModuleNotFoundError: No module named 'apt_pkg'
Creating the symlink sudo ln -s apt_pkg.cpython-36m-aarch64-linux-gnu.so apt_pkg.so worked.

Soooo… now I’ve tested it with a tflite model, which worked.
But trying a .pbmm model will throw RuntimeError: CreateModel failed with 'Failed to initialize memory mapped model.' (0x3000)

Full error message

Loading model from file Data/german/output_graph_de.pbmm
TensorFlow: v2.2.0-17-g0854bb5188
DeepSpeech: v0.8.0-0-gf56b07da
<’, should be 'TFL3’d has model identifier '�

Error at reading model file Data/german/output_graph_de.pbmm
Traceback (most recent call last):
File “/home/jetson/Projects/DS-German/ds_env/bin/deepspeech”, line 8, in
sys.exit(main())
File “/home/jetson/Projects/DS-German/ds_env/lib/python3.7/site-packages/deepspeech/client.py”, line 117, in main
ds = Model(args.model)
File “/home/jetson/Projects/DS-German/ds_env/lib/python3.7/site-packages/deepspeech/init.py”, line 38, in init
raise RuntimeError(“CreateModel failed with ‘{}’ (0x{:X})”.format(deepspeech.impl.ErrorCodeToErrorMessage(status),status))
RuntimeError: CreateModel failed with ‘Failed to initialize memory mapped model.’ (0x3000)

Is there a way to verify if I actually build it with gpu support?


I’ve also did a local build for my system just to get a better understanding of the procedures. It all worked flawlessly. Only thing I’ve noticed are the differences in size of the libdeepspeech.so.

Jetson: ~16 MB
Ubuntu: ~90 MB

Is such difference normal?

Thanks for the help once again!
Best regards.
Alex

Sounds like TensorFlow / TFLite.

I missed it, but those config will force TFLite: https://github.com/mozilla/tensorflow/blob/23ad988fcde60fb01f9533e95004bbc4877a9143/.bazelrc#L220

It’s likely because of TFLite vs TensorFlow as well.

Remove the --define=runtime=tflite from the .bazelrc file and it should be good

1 Like

Hey lissyx,

Thanks, this seemed to work until i got the following error:

/media/alexander/LinuxFS/Projects/DSSource/STT/tensorflow/tensorflow/core/kernels/BUILD:8394:1: C++ compilation of rule '//tensorflow/core/kernels:deepspeech_cwise_ops_gpu' failed (Exit 1)
    aarch64-linux-gnu-gcc: error: unrecognized command line option '**-nvcc_options=relaxed-constexpr**'
    aarch64-linux-gnu-gcc: error: unrecognized command line option '**-nvcc_options=ftz=true**

Only thing i found was https://github.com/tensorflow/tensorflow/issues/3501 but not sure what changes have to be made…

Seeing https://github.com/tensorflow/tensorflow/issues/3501#issuecomment-238000605 I agree, and I don’t really know either but maybe you need to hack into our cross-compilation toolchain.

I have no idea if TensorFlow itself supports ARM64 cross-compilation with CUDA enabled, you might be in unknown territory here :confused:

You might need to explore https://github.com/mozilla/tensorflow/tree/r2.3/third_party/toolchains/embedded/linaro-gcc72-aarch64 and also this gcc wrapper for nvcc.

Oh boy, what have I got myself into. :stuck_out_tongue:

I will try to find a solution. Might take a while, since this is as you said unknown territory to me. :sweat_smile:

Thanks again!

Thanks, I’m still interested in seeing this progressing, but I guess you understand why we did not support this setup :).

Also, depending on your usecase, ARM64 tflite pure CPU might be enough.

I don’t know if you can leverage CUDA from TFLite runtime, but there is GPU delegation there that can use OpenCL. We have some bits in the library for allowing GPU delegation, but it’s likely incomplete and might require model hacking.

If you want to explore that as well, it’d be interesting to know what you get.

Now I do :stuck_out_tongue:

Actually I want to compare the performance of CPU vs GPU on say like longer sequences and such.

Guess can add it to my pipeline of learning ^^ but for now I’ll need to get a better overview of the whole architecture.

Also this https://github.com/tensorflow/tensorflow/issues/16779 might be interesting.

@sanjaesc as discussed in private channel: I have succesfully build DeepSpeech 0.8.2 for Jetson/Xavier

@dkreutz Here @sanjaesc was trying to do it using cross-compilation, which could enable us to have it on taskcluster