ARM native_client with GPU support

You are deeply confused. You dont need to build tensorflow pip package to build the deepspeech Python wheel :). Make sure you are using the proper build flags when you build, if you are targetting CUDA, any bazel build statement should have --config=cuda.

ah ha! i misparsed what you were saying, yeah.

i know i don’t need the TF wheels to build the deepspeech-gpu wheel. I know i do need the deepspeech-gpu wheel.

1 Like

Perfect, your wording was kind of unclear to me, I don’t want unclear instructions :slight_smile:

1 Like

@gvoysey I am bit confused at this moment.
If I understand what you said, you mean if I install the tensorflow-gpu from the pip and install deepspeech-gpu, I will have some troubles with it. As

but as per @lissyx, I won’t have issues using both of them together.

@saikishor i’ll defer to @lissyx here, but my point is simply that I do not believe that pip install deepspeech-gpu will work on an nvidia TX-class machine. (i’m pretty sure that pip install tensorflow-gpu won’t work on a TX either, but it’s moot)

If i’m wrong, i’ll be very happy, but I don’t think i am.

It would work if there were ARMv8 binaries for the system. We don’t provide any for DeepSpeech, because this would require setting up ARMv8 cross-compilation and while it’s do-able, we have more important things to focus on. You can do pip install <deepspeech.whl> once you have built on your ARMv8 system the Python package, though :).

1 Like

@gvoysey Yes, I also believe that pip install tensorflow-gpu doesn’t work, so I am going to follow something that was explained as per Jetsonhacks. As Jetson TX-2 is of ARMv8 system and as per what @lissyx mentioned now:

I guess the pip install <deepspeech.whl> should work, as per what lissyx is referring to. I guess TX-1 is equipped with ARMv6, So this requires this setting I suppose. What do you think?.

Why do you keep wanting to install tensorflow-gpu package ? If you are only running inference, you don’t need that. You might want deepspeech-gpu, but again, read what I said above on that: we only provide for ARMv6, so no GPU. Follow native_client/README.md to build.

1 Like

@saikishor the tx-1 has Quad ARM® A57/2 MB L2; which are v8. https://developer.arm.com/products/processors/cortex-a/cortex-a57

1 Like

@lissyx do you recommend starting from DeepSpeech latest commit, or the v0.1.1 tag?

You should use master, that’s where all the fun is :). It’s bringing a lot of improvements as well …

got it. but pegged to https://github.com/mozilla/tensorflow@v1.5, right?

1 Like

@lissyx Thank you that was helpful, so now I should build for ARMv8 as per the guidelines mentioned in native_client/README.md with the GPU hacks presented by @gvoysey and @elpimous_robot and get the .whl built, and install it using pip.

Yes, TensorFlow master has some slight differences that will make the bazel build choke on some definitions we have in native_client/BUILD. Don’t forget --config=cuda, if you need CUDA. We are trying to improve that, now the README.md should point to the proper matching branch, with currently DeepSpeech/master being tied to tensorflow/r1.5 and DeepSpeech/tf-master being tied to tensorflow/master, in case you are curious / want to hack on more recent codebase.

2 Likes

I guess there is lot of work to do :smile: @gvoysey

@gvoysey are you going to start from the scratch for tensorflowv1.5, and I would like to know one more thing, are you developing the stuff in dockers or directly on your TX machine.

hi @saikishor,

use mozilla/deepspeech/master1.5, mozilla/deepspeech/master
install requirement.txt, follow native_client/readme.md,
remember to add --config=cuda (to use tx1/2 gpu0)

and you should a nice deepspeech working on our fabulous nvidia boards !

1 Like

Notes and steps for compiling natively on a Jetson machine. Goal: get
CUDA-enabled native_client==v0.1.1.

Repo setup

We need to compile mozilla’s tf 1.5 fork as well as the native_client
package provided as part of mozilla DeepSpeech.

cd $HOME/deepspeech  #project root
git clone https://github.com/mozilla/DeepSpeech@master
git clone https://github.com/mozilla/tensorflow
#master breaks bazel.
cd tensorflow && git checkout r1.5
#put a symlink to native client
cd ../DeepSpeech
ln -s native_client ../tensorflow
cd $HOME
ln -s deepspeech/DeepSpeech ./
ln -s deepspeech/tensorflow ./

ARMv8 patches and local changes

First, we have to patch native_client /kenlm/util/double-conversion/utils.h to allow aarch64 to round
properly. Failure to do this means that kenlm won’t build.

diff --git a/native_client/kenlm/util/double-conversion/utils.h b/native_client/kenlm/util/double-conversion/utils.h
index 9ccb3b6..492b8bd 100644
--- a/native_client/kenlm/util/double-conversion/utils.h
+++ b/native_client/kenlm/util/double-conversion/utils.h
@@ -52,7 +52,7 @@
 // the output of the division with the expected result. (Inlining must be
 // disabled.)
 // On Linux,x86 89255e-22 != Div_double(89255.0/1e22)
-#if defined(_M_X64) || defined(__x86_64__) || \
+#if defined(__aarch64__) || defined(_M_X64) || defined(__x86_64__) ||  \
     defined(__ARMEL__) || defined(__avr32__) || \
     defined(__hppa__) || defined(__ia64__) || \
     defined(__mips__) || defined(__powerpc__) || \

Then, we can carefully construct a few shell scripts to build
tensorflow, then finally build native_client and wheels.

tensorflow

Using tensorflow/tc-build.sh as inspiration, we just want to pass the
right environment variables to bazel so we can run the whole thing as a
one-shot.

#!/bin/bash

set -ex
PROJECT_ROOT=$HOME/deepspeech
LD_LIBRARY_PATH=/usr/local/cuda/targets/aarch64-linux/lib/:/usr/local/cuda/targets/aarch64-linux/lib/stubs:$LD_LIBRARY_PATH


export TF_ENABLE_XLA=0
export TF_NEED_JEMALLOC=1
export TF_NEED_GCP=0
export TF_NEED_HDFS=0
export TF_NEED_OPENCL_SYCL=0
export TF_NEED_MKL=0
export TF_NEED_VERBS=0
export TF_NEED_MPI=0
export TF_NEED_S3=0
export TF_NEED_GDR=0
export TF_SET_ANDROID_WORKSPACE=0
export GCC_HOST_COMPILER_PATH=/usr/bin/gcc
export TF_NEED_CUDA=1
export TX_CUDA_PATH='/usr/local/cuda'
export TX_CUDNN_PATH='/usr/lib/aarch64-linux-gnu/'
export TF_CUDA_FLAGS="TF_CUDA_CLANG=0 TF_CUDA_VERSION=8.0 TF_CUDNN_VERSION=6 CUDA_TOOLKIT_PATH=${TX_CUDA_PATH} CUDNN_INSTALL_PATH=${TX_CUDNN_PATH} TF_CUDA_COMPUTE_CAPABILITIES=\"3.0,3.5,3.7,5.2,5.3,6.0,6.1\""

cd ${PROJECT_ROOT}/tensorflow && \
eval "export ${TF_CUDA_FLAGS}" && (echo "" | ./configure) && \
bazel build -s --explain bazel_kenlm_tf.log \
      --verbose_explanations \
      -c opt \
      --copt=-O3 \
      --config=cuda \
      //native_client:libctc_decoder_with_kenlm.so && \
bazel build -s --explain bazel_monolithic_tf.log \
      --verbose_explanations \
      --config=monolithic \
      -c opt \
      --copt=-O3 \
      --config=cuda \
      --copt=-fvisibility=hidden \
      //native_client:libdeepspeech.so \
      //native_client:deepspeech_utils \
      //native_client:generate_trie

Completion

This builds cleanly, so we can inspect the contents of the libraries
we’ve made.

  1. libdeepspeech.so

    Are the symbols there? Looks like!

    ubuntu@nvidia:~/deepspeech/tensorflow/bazel-bin/native_client$ nm -gC libdeepspeech.so | grep Model::Model
    00000000008390a0 T DeepSpeech::Model::Model(char const*, int, int, char const*, int)
    00000000008390a0 T DeepSpeech::Model::Model(char const*, int, int, char const*, int)
    

    Does it look like it linked sanely? Yes.

    ubuntu@nvidia:~/deepspeech/tensorflow$ ldd bazel-bin/native_client/libdeepspeech.so
            linux-vdso.so.1 =>  (0x0000007f871b8000)
            libcusolver.so.8.0 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccusolver___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcusolver.so.8.0 (0x0000007f7e63a000)
            libcublas.so.8.0 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.8.0 (0x0000007f7b93c000)
            libcuda.so.1 => /usr/lib/libcuda.so.1 (0x0000007f7af61000)
            libcudnn.so.6 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudnn___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcudnn.so.6 (0x0000007f7013a000)
            libcufft.so.8.0 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccufft___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcufft.so.8.0 (0x0000007f66a54000)
            libcurand.so.8.0 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccurand___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcurand.so.8.0 (0x0000007f633fb000)
            libcudart.so.8.0 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcudart.so.8.0 (0x0000007f63397000)
            libgomp.so.1 => /usr/lib/aarch64-linux-gnu/libgomp.so.1 (0x0000007f63369000)
            libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f63356000)
            libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f632a8000)
            libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f6327c000)
            libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f630ed000)
            libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f630cb000)
            libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f62f84000)
            /lib/ld-linux-aarch64.so.1 (0x0000005557db8000)
            librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f62f6d000)
            libnvrm_gpu.so => /usr/lib/libnvrm_gpu.so (0x0000007f62f36000)
            libnvrm.so => /usr/lib/libnvrm.so (0x0000007f62efb000)
            libnvidia-fatbinaryloader.so.384.00 => /usr/lib/libnvidia-fatbinaryloader.so.384.00 (0x0000007f62e92000)
            libnvos.so => /usr/lib/libnvos.so (0x0000007f62e74000)
    

native client

To build the native client, next…

#!/bin/bash

SYSTEM_TARGET=host
EXTRA_LOCAL_CFLAGS="-march=armv8-a"
EXTRA_LOCAL_LDFLAGS="-L/usr/local/cuda/targets/aarch64-linux/lib/ -L/usr/local/cuda/targets/aarch64-linux/lib/stubs -lcudart -lcuda"
SETUP_FLAGS="--project_name deepspeech-gpu"
DS_TFDIR="${HOME}/deepspeech/tensorflow"
cd ./DeepSpeech
mkdir -p wheels
make clean 
EXTRA_CFLAGS="${EXTRA_LOCAL_CFLAGS}" \
EXTRA_LDFLAGS="${EXTRA_LOCAL_LDFLAGS}" \
EXTRA_LIBS="${EXTRA_LOCAL_LIBS}" \
make -C native_client/ \TARGET=${SYSTEM_TARGET} \
      TFDIR=${DS_TFDIR} \
      SETUP_FLAGS="${SETUP_FLAGS}" \
      bindings-clean bindings

cp native_client/dist/*.whl wheels

make -C native_client/ bindings-clean

and lo, we now have a wheel.

test

Does it work?

In [6]: ds = model.Model('output_graph.pb',26,9,'alphabet.txt',500)
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-02-23 14:06:05.179493: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:04:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-23 14:06:05.180794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: GP106 major: 6 minor: 1 memoryClockRate(GHz): 1.29
pciBusID: 0000:04:00.0
totalMemory: 3.75GiB freeMemory: 3.67GiB
2018-02-23 14:06:05.270756: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-23 14:06:05.270920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 1 with properties: 
name: GP10B major: 6 minor: 2 memoryClockRate(GHz): 1.275
pciBusID: 0000:00:00.0
totalMemory: 6.50GiB freeMemory: 3.94GiB
2018-02-23 14:06:05.271027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Device peer to peer matrix
2018-02-23 14:06:05.271090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126] DMA: 0 1 
2018-02-23 14:06:05.271117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 0:   Y N 
2018-02-23 14:06:05.271157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 1:   N Y 
2018-02-23 14:06:05.271247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GP106, pci bus id: 0000:04:00.0, compute capability: 6.1)
2018-02-23 14:06:05.271304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1182] Ignoring gpu device (device: 1, name: GP10B, pci bus id: 0000:00:00.0, compute capability: 6.2) with Cuda multiprocessor count: 2. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
2018-02-23 14:08:22.530561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0.  Your kernel may not have been built with NUMA support.

Looks hopeful…

In [1]: from scipy.io import wavfile
ds = Model('output_graph.pb',26,9,'alphabet.txt',500)
fs,wav = wavfile.read('test.wav')
ds.stt(wav,fs)
Out [1]: 'test'

and running tegrastats at the same time:

RAM 4642/6660MB (lfb 5x2MB) SWAP 811/8192MB (cached 67MB) cpu [10%@1991,0%@2034,0%@2035,7%@1992,5%@1995,8%@1993] EMC 0%@1600 GR3D 0%@1275 GR3D_PCI 98%@2607

So GPU is pegged and the CPU is nicely quiet. Finally!

4 Likes

@gvoysey If I am using --config=cuda as a parameter, then I am getting the below error, without that it build fine. Can you help me at this point.

nvidia@tegra-ubuntu:~/deepspeech/tensorflow$ bazel build -c opt --copt=-O3 //native_client:libctc_decoder_with_kenlm.so --config=cuda
........................
ERROR: /home/nvidia/.cache/bazel/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/local_config_cuda/crosstool/BUILD:4:1: Traceback (most recent call last):
	File "/home/nvidia/.cache/bazel/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/local_config_cuda/crosstool/BUILD", line 4
		error_gpu_disabled()
	File "/home/nvidia/.cache/bazel/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/local_config_cuda/crosstool/error_gpu_disabled.bzl", line 3, in error_gpu_disabled
		fail("ERROR: Building with --config=c...")
ERROR: Building with --config=cuda but TensorFlow is not configured to build with GPU support. Please re-run ./configure and enter 'Y' at the prompt to build with GPU support.
ERROR: no such target '@local_config_cuda//crosstool:toolchain': target 'toolchain' not declared in package 'crosstool' defined by /home/nvidia/.cache/bazel/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/local_config_cuda/crosstool/BUILD.
INFO: Elapsed time: 3.482s

At the first instance, I have the above error and I solved it by running ./configure and setting CUDA and CuDNN and later I face the following issue.

Configuration I have set:
nvidia@tegra-ubuntu:~/deepspeech/tensorflow$ ./configure
You have bazel 0.5.4- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python]:

Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: y
Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: y
Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: y
Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: y
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: y
GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: y
VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 8.0


Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 6.0.21


Please specify the location where cuDNN 6.0.21 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]3.5,5.2,6.2


Do you want to use clang as CUDA compiler? [y/N]: y
Clang will be used as CUDA compiler.

Please specify which clang should be used as device and host compiler. [Default is ]: /usr/bin/g++


Do you wish to build TensorFlow with MPI support? [y/N]: y
MPI support will be enabled for TensorFlow.

Please specify the MPI toolkit folder. [Default is /usr]: 


Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 


Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Configuration finished

This is the output, I got after setting the configuration

nvidia@tegra-ubuntu:~/deepspeech/tensorflow$ bazel build -c opt --copt=-O3 //native_client:libctc_decoder_with_kenlm.so --config=cuda
ERROR: /home/nvidia/.cache/bazel/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/org_tensorflow/tensorflow/BUILD:703:1: Illegal ambiguous match on configurable attribute "deps" in @org_tensorflow//tensorflow:libtensorflow_framework.so:
@local_config_cuda//cuda:using_clang
@local_config_cuda//cuda:using_nvcc
Multiple matches are not allowed unless one is unambiguously more specialized.
ERROR: Analysis of target '//native_client:libctc_decoder_with_kenlm.so' failed; build aborted.
INFO: Elapsed time: 0.220s

Please retry without enabling everything: you don’t need S3, Google Cloud, HADOOP, GDR, VERBS, MPI, and you don’t want to use Clang for CUDA.

1 Like