ARM native_client with GPU support


I am attempting to build a version of deepspeech-gpu bindings and the native_client for ARMv8 with GPU support. The target platform is NVIDIA’s Jetson-class embedded systems – the TX-1/2 in particular, but I have access to a PX2 as well.

These systems run ubuntu 16.04 LTS for aarch64. Cuda 8.0, Cudnn 6, and the compute capability is 5.2.

I have the Deepspeech repo as of commit e5757d21a38d40923c1de9b86597685f365150ee, the Mozilla fork of tensorflow as of commit 08894f64fc67b7a8031fc68cb838a27009c3e6e6, and bazel 0.5.4. My python version is 3.5.2.

I have added the --config=cuda option to the suggested build command. Here’s the session output:

ubuntu@nvidia:~/Source/deepspeech/tensorflow$ bazel build -c opt --config=cuda --copt=-O3 // // //native_client:deepspeech //native_client:deepspeech_utils // //native_client:generate_trie
547 / 671] Compiling native_client/kenlm/util/double-conversion/bignum-dtoERROR: /home/ubuntu/Source/deepspeech/tensorflow/native_client/BUILD:48:1:C++ compilation of rule '//native_client:deepspeech' failed (Exit 1).
In file included from native_client/kenlm/util/double-conversion/bignum-dtoa.h:31:0, from native_client/kenlm/util/double-conversion/
native_client/kenlm/util/double-conversion/utils.h:71:2: error: #error Target architecture was not detected as supported by Double-Conversion.
 #error Target architecture was not detected as supported by Double-Conversion.

What is a more appropriate list of build targets to give bazel? I’m willing to go without the language model for now if i have to – the raw output from the NN is good enough for my purposes right now.

(Lissyx) #2

Thanks for testing this! I know that @elpimous_robot succeeded in this setup and he had to add a small patch on top of KenLM. As much as I can tell, he was in the process of submitting this patch upstream.

(Vincent Foucault) #3

open this file (change with your link):

Add it : defined(__aarch64__) ||

// On Linux,x86 89255e-22 != Div_double(89255.0/1e22)
#if defined(_M_X64) || defined(__x86_64__) || \
    defined(__ARMEL__) || defined(__avr32__) || \
    defined(__hppa__) || defined(__ia64__) || \
    defined(__mips__) || defined(__powerpc__) || \
    defined(__sparc__) || defined(__sparc) || defined(__s390__) || \
    defined(__SH4__) || defined(__alpha__) || defined(__aarch64__) || \
#elif defined(_M_IX86) || defined(__i386__) || defined(__i386)
#if defined(_WIN32)

That’s all.


Okay, that really helped a lot.

I can make a wheel – @lissyx , are all wheels named deepspeech-0.1.0-... or must I do something else to get deepspeech-gpu ?

(Lissyx) #5

If you want to make wheels available, you should take a look at, it does document it :slight_smile:
But at some point, if you can, it would be better to just work on adding ARMv8 cross-compilation, that would benefit for everybody.

(Lissyx) #6

More precisely @gvoysey, it is there:


@lissyx i’ll update in ~18 hours with my progress.

I had thought that the tf build_pip_package tool was just for building wheels of tensorflow itself, so i’ll investigate further.

(Lissyx) #8

Oh, right. Sorry, I misread. It’s handled there: and for GPU builds you need to pass --project_name deepspeech-gpu :slight_smile: