ARM native_client with GPU support

@gvoysey are you going to start from the scratch for tensorflowv1.5, and I would like to know one more thing, are you developing the stuff in dockers or directly on your TX machine.

hi @saikishor,

use mozilla/deepspeech/master1.5, mozilla/deepspeech/master
install requirement.txt, follow native_client/readme.md,
remember to add --config=cuda (to use tx1/2 gpu0)

and you should a nice deepspeech working on our fabulous nvidia boards !

1 Like

Notes and steps for compiling natively on a Jetson machine. Goal: get
CUDA-enabled native_client==v0.1.1.

Repo setup

We need to compile mozilla’s tf 1.5 fork as well as the native_client
package provided as part of mozilla DeepSpeech.

cd $HOME/deepspeech  #project root
git clone https://github.com/mozilla/DeepSpeech@master
git clone https://github.com/mozilla/tensorflow
#master breaks bazel.
cd tensorflow && git checkout r1.5
#put a symlink to native client
cd ../DeepSpeech
ln -s native_client ../tensorflow
cd $HOME
ln -s deepspeech/DeepSpeech ./
ln -s deepspeech/tensorflow ./

ARMv8 patches and local changes

First, we have to patch native_client /kenlm/util/double-conversion/utils.h to allow aarch64 to round
properly. Failure to do this means that kenlm won’t build.

diff --git a/native_client/kenlm/util/double-conversion/utils.h b/native_client/kenlm/util/double-conversion/utils.h
index 9ccb3b6..492b8bd 100644
--- a/native_client/kenlm/util/double-conversion/utils.h
+++ b/native_client/kenlm/util/double-conversion/utils.h
@@ -52,7 +52,7 @@
 // the output of the division with the expected result. (Inlining must be
 // disabled.)
 // On Linux,x86 89255e-22 != Div_double(89255.0/1e22)
-#if defined(_M_X64) || defined(__x86_64__) || \
+#if defined(__aarch64__) || defined(_M_X64) || defined(__x86_64__) ||  \
     defined(__ARMEL__) || defined(__avr32__) || \
     defined(__hppa__) || defined(__ia64__) || \
     defined(__mips__) || defined(__powerpc__) || \

Then, we can carefully construct a few shell scripts to build
tensorflow, then finally build native_client and wheels.

tensorflow

Using tensorflow/tc-build.sh as inspiration, we just want to pass the
right environment variables to bazel so we can run the whole thing as a
one-shot.

#!/bin/bash

set -ex
PROJECT_ROOT=$HOME/deepspeech
LD_LIBRARY_PATH=/usr/local/cuda/targets/aarch64-linux/lib/:/usr/local/cuda/targets/aarch64-linux/lib/stubs:$LD_LIBRARY_PATH


export TF_ENABLE_XLA=0
export TF_NEED_JEMALLOC=1
export TF_NEED_GCP=0
export TF_NEED_HDFS=0
export TF_NEED_OPENCL_SYCL=0
export TF_NEED_MKL=0
export TF_NEED_VERBS=0
export TF_NEED_MPI=0
export TF_NEED_S3=0
export TF_NEED_GDR=0
export TF_SET_ANDROID_WORKSPACE=0
export GCC_HOST_COMPILER_PATH=/usr/bin/gcc
export TF_NEED_CUDA=1
export TX_CUDA_PATH='/usr/local/cuda'
export TX_CUDNN_PATH='/usr/lib/aarch64-linux-gnu/'
export TF_CUDA_FLAGS="TF_CUDA_CLANG=0 TF_CUDA_VERSION=8.0 TF_CUDNN_VERSION=6 CUDA_TOOLKIT_PATH=${TX_CUDA_PATH} CUDNN_INSTALL_PATH=${TX_CUDNN_PATH} TF_CUDA_COMPUTE_CAPABILITIES=\"3.0,3.5,3.7,5.2,5.3,6.0,6.1\""

cd ${PROJECT_ROOT}/tensorflow && \
eval "export ${TF_CUDA_FLAGS}" && (echo "" | ./configure) && \
bazel build -s --explain bazel_kenlm_tf.log \
      --verbose_explanations \
      -c opt \
      --copt=-O3 \
      --config=cuda \
      //native_client:libctc_decoder_with_kenlm.so && \
bazel build -s --explain bazel_monolithic_tf.log \
      --verbose_explanations \
      --config=monolithic \
      -c opt \
      --copt=-O3 \
      --config=cuda \
      --copt=-fvisibility=hidden \
      //native_client:libdeepspeech.so \
      //native_client:deepspeech_utils \
      //native_client:generate_trie

Completion

This builds cleanly, so we can inspect the contents of the libraries
we’ve made.

  1. libdeepspeech.so

    Are the symbols there? Looks like!

    ubuntu@nvidia:~/deepspeech/tensorflow/bazel-bin/native_client$ nm -gC libdeepspeech.so | grep Model::Model
    00000000008390a0 T DeepSpeech::Model::Model(char const*, int, int, char const*, int)
    00000000008390a0 T DeepSpeech::Model::Model(char const*, int, int, char const*, int)
    

    Does it look like it linked sanely? Yes.

    ubuntu@nvidia:~/deepspeech/tensorflow$ ldd bazel-bin/native_client/libdeepspeech.so
            linux-vdso.so.1 =>  (0x0000007f871b8000)
            libcusolver.so.8.0 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccusolver___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcusolver.so.8.0 (0x0000007f7e63a000)
            libcublas.so.8.0 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.8.0 (0x0000007f7b93c000)
            libcuda.so.1 => /usr/lib/libcuda.so.1 (0x0000007f7af61000)
            libcudnn.so.6 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudnn___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcudnn.so.6 (0x0000007f7013a000)
            libcufft.so.8.0 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccufft___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcufft.so.8.0 (0x0000007f66a54000)
            libcurand.so.8.0 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccurand___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcurand.so.8.0 (0x0000007f633fb000)
            libcudart.so.8.0 => /home/ubuntu/deepspeech/tensorflow/bazel-bin/native_client/../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcudart.so.8.0 (0x0000007f63397000)
            libgomp.so.1 => /usr/lib/aarch64-linux-gnu/libgomp.so.1 (0x0000007f63369000)
            libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f63356000)
            libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f632a8000)
            libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f6327c000)
            libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f630ed000)
            libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f630cb000)
            libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f62f84000)
            /lib/ld-linux-aarch64.so.1 (0x0000005557db8000)
            librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f62f6d000)
            libnvrm_gpu.so => /usr/lib/libnvrm_gpu.so (0x0000007f62f36000)
            libnvrm.so => /usr/lib/libnvrm.so (0x0000007f62efb000)
            libnvidia-fatbinaryloader.so.384.00 => /usr/lib/libnvidia-fatbinaryloader.so.384.00 (0x0000007f62e92000)
            libnvos.so => /usr/lib/libnvos.so (0x0000007f62e74000)
    

native client

To build the native client, next…

#!/bin/bash

SYSTEM_TARGET=host
EXTRA_LOCAL_CFLAGS="-march=armv8-a"
EXTRA_LOCAL_LDFLAGS="-L/usr/local/cuda/targets/aarch64-linux/lib/ -L/usr/local/cuda/targets/aarch64-linux/lib/stubs -lcudart -lcuda"
SETUP_FLAGS="--project_name deepspeech-gpu"
DS_TFDIR="${HOME}/deepspeech/tensorflow"
cd ./DeepSpeech
mkdir -p wheels
make clean 
EXTRA_CFLAGS="${EXTRA_LOCAL_CFLAGS}" \
EXTRA_LDFLAGS="${EXTRA_LOCAL_LDFLAGS}" \
EXTRA_LIBS="${EXTRA_LOCAL_LIBS}" \
make -C native_client/ \TARGET=${SYSTEM_TARGET} \
      TFDIR=${DS_TFDIR} \
      SETUP_FLAGS="${SETUP_FLAGS}" \
      bindings-clean bindings

cp native_client/dist/*.whl wheels

make -C native_client/ bindings-clean

and lo, we now have a wheel.

test

Does it work?

In [6]: ds = model.Model('output_graph.pb',26,9,'alphabet.txt',500)
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-02-23 14:06:05.179493: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:04:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-23 14:06:05.180794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: GP106 major: 6 minor: 1 memoryClockRate(GHz): 1.29
pciBusID: 0000:04:00.0
totalMemory: 3.75GiB freeMemory: 3.67GiB
2018-02-23 14:06:05.270756: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-23 14:06:05.270920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 1 with properties: 
name: GP10B major: 6 minor: 2 memoryClockRate(GHz): 1.275
pciBusID: 0000:00:00.0
totalMemory: 6.50GiB freeMemory: 3.94GiB
2018-02-23 14:06:05.271027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Device peer to peer matrix
2018-02-23 14:06:05.271090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126] DMA: 0 1 
2018-02-23 14:06:05.271117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 0:   Y N 
2018-02-23 14:06:05.271157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1136] 1:   N Y 
2018-02-23 14:06:05.271247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GP106, pci bus id: 0000:04:00.0, compute capability: 6.1)
2018-02-23 14:06:05.271304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1182] Ignoring gpu device (device: 1, name: GP10B, pci bus id: 0000:00:00.0, compute capability: 6.2) with Cuda multiprocessor count: 2. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
2018-02-23 14:08:22.530561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0.  Your kernel may not have been built with NUMA support.

Looks hopeful…

In [1]: from scipy.io import wavfile
ds = Model('output_graph.pb',26,9,'alphabet.txt',500)
fs,wav = wavfile.read('test.wav')
ds.stt(wav,fs)
Out [1]: 'test'

and running tegrastats at the same time:

RAM 4642/6660MB (lfb 5x2MB) SWAP 811/8192MB (cached 67MB) cpu [10%@1991,0%@2034,0%@2035,7%@1992,5%@1995,8%@1993] EMC 0%@1600 GR3D 0%@1275 GR3D_PCI 98%@2607

So GPU is pegged and the CPU is nicely quiet. Finally!

4 Likes

@gvoysey If I am using --config=cuda as a parameter, then I am getting the below error, without that it build fine. Can you help me at this point.

nvidia@tegra-ubuntu:~/deepspeech/tensorflow$ bazel build -c opt --copt=-O3 //native_client:libctc_decoder_with_kenlm.so --config=cuda
........................
ERROR: /home/nvidia/.cache/bazel/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/local_config_cuda/crosstool/BUILD:4:1: Traceback (most recent call last):
	File "/home/nvidia/.cache/bazel/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/local_config_cuda/crosstool/BUILD", line 4
		error_gpu_disabled()
	File "/home/nvidia/.cache/bazel/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/local_config_cuda/crosstool/error_gpu_disabled.bzl", line 3, in error_gpu_disabled
		fail("ERROR: Building with --config=c...")
ERROR: Building with --config=cuda but TensorFlow is not configured to build with GPU support. Please re-run ./configure and enter 'Y' at the prompt to build with GPU support.
ERROR: no such target '@local_config_cuda//crosstool:toolchain': target 'toolchain' not declared in package 'crosstool' defined by /home/nvidia/.cache/bazel/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/local_config_cuda/crosstool/BUILD.
INFO: Elapsed time: 3.482s

At the first instance, I have the above error and I solved it by running ./configure and setting CUDA and CuDNN and later I face the following issue.

Configuration I have set:
nvidia@tegra-ubuntu:~/deepspeech/tensorflow$ ./configure
You have bazel 0.5.4- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python]:

Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: y
Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: y
Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: y
Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: y
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: y
GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: y
VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 8.0


Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 6.0.21


Please specify the location where cuDNN 6.0.21 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]3.5,5.2,6.2


Do you want to use clang as CUDA compiler? [y/N]: y
Clang will be used as CUDA compiler.

Please specify which clang should be used as device and host compiler. [Default is ]: /usr/bin/g++


Do you wish to build TensorFlow with MPI support? [y/N]: y
MPI support will be enabled for TensorFlow.

Please specify the MPI toolkit folder. [Default is /usr]: 


Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 


Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Configuration finished

This is the output, I got after setting the configuration

nvidia@tegra-ubuntu:~/deepspeech/tensorflow$ bazel build -c opt --copt=-O3 //native_client:libctc_decoder_with_kenlm.so --config=cuda
ERROR: /home/nvidia/.cache/bazel/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/org_tensorflow/tensorflow/BUILD:703:1: Illegal ambiguous match on configurable attribute "deps" in @org_tensorflow//tensorflow:libtensorflow_framework.so:
@local_config_cuda//cuda:using_clang
@local_config_cuda//cuda:using_nvcc
Multiple matches are not allowed unless one is unambiguously more specialized.
ERROR: Analysis of target '//native_client:libctc_decoder_with_kenlm.so' failed; build aborted.
INFO: Elapsed time: 0.220s

Please retry without enabling everything: you don’t need S3, Google Cloud, HADOOP, GDR, VERBS, MPI, and you don’t want to use Clang for CUDA.

1 Like

@saikishor You might need to ensure you gave some older GCC (4.9 is what I use) and set this env var when running configure: GCC_HOST_COMPILER_PATH=/usr/bin/gcc-4.9

1 Like

Thanks a lot @lissyx, your guidelines helped me to build a .whl file. I am able to successfully install the GPU version on my Nvidia TX-2.

I tried with gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 and it worked in my case.

@elpimous_robot @gvoysey @lissyx I don’t know whether you faced this issue or not. The inference time on TX-2 is very long and sometimes it even crashes. Do you have any idea to fix this issue?.

Successful Inference:

nvidia@tegra-ubuntu:~/DeepSpeech$ deepspeech models/output_graph.pb models/alphabet.txt data/Speech_test_data/can_you_find_a_seat_for_me.wav 
Loading model from file models/output_graph.pb
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-02-27 13:51:44.790188: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-27 13:51:44.790334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 5.14GiB
2018-02-27 13:51:44.790440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-27 13:51:45.309046: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
Loaded model in 4.254s.
Running inference.
can you find his seat for me
Inference took 22.885s for 4.000s audio file.

Incomplete Inference:

nvidia@tegra-ubuntu:~/DeepSpeech$ deepspeech models/output_graph.pb models/alphabet.txt data/Speech_test_data/get_me_a_glass_of_water.wav 
Loading model from file models/output_graph.pb
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-02-27 13:52:50.278300: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-27 13:52:50.278444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.75GiB
2018-02-27 13:52:50.278518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-27 13:52:50.782133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
Loaded model in 3.266s.
Running inference.
2018-02-27 13:53:12.079776: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED
2018-02-27 13:53:12.080044: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1
Aborted (core dumped)


nvidia@tegra-ubuntu:~/DeepSpeech$ deepspeech models/output_graph.pb models/alphabet.txt models/lm.binary models/trie data/Speech_test_data/get_me_a_glass_of_water_sampled.wav 
Loading model from file models/output_graph.pb
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-02-27 14:31:22.311143: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-27 14:31:22.311272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 2.30GiB
2018-02-27 14:31:22.311360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-27 14:31:22.801195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

I had some issues on ARMv6 with RPi3 but that was when experimenting funny stuff, default was working. I don’t have an idea what is going on in your case, and I don’t have time to investigate.

OK thanks for the reply. Let me wait for the reply from the other two…

Worst case, but it’s going to be slow, try with a tensorflow debug build (-c dbg instead of -c opt on the command line) and then gdb the C++ client, to see where it is breaking. There might be a legit issue?

1 Like

Hi,

I am trying to build the native_client on NVidia TX2 with the following configuration:

  • JetPack 3.2
  • CUDA 9.0
  • cuDNN 7.0
  • gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
  • tensorflow (from mozila) at commit: ad8f785459e80823a2ff4456eeb9d7220c33b9c6
  • DeepSpeech at commit: 3c546d50059d468ea199814d77bac4ea97b5ee57

and when I am running:

bazel build -s -c opt --copt=-O3 --config=cuda //native_client:libctc_decoder_with_kenlm.so

I am getting this error:

SUBCOMMAND: # @protobuf_archive//:protobuf_lite [action 'Compiling external/protobuf_archive/src/google/protobuf/message_lite.cc [for host]']
(cd /home/nvidia/data_1/bazel_cache/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/execroot/org_tensorflow && \
  exec env - \
    LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64: \
    PATH=/usr/local/cuda-9.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
    PWD=/proc/self/cwd \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel-out/host/bin/external/protobuf_archive/_objs/protobuf_lite/external/protobuf_archive/src/google/protobuf/message_lite.d '-frandom-seed=bazel-out/host/bin/external/protobuf_archive/_objs/protobuf_lite/external/protobuf_archive/src/google/protobuf/message_lite.o' -iquote external/protobuf_archive -iquote bazel-out/host/genfiles/external/protobuf_archive -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -isystem external/protobuf_archive/src -isystem bazel-out/host/genfiles/external/protobuf_archive/src -isystem bazel-out/host/bin/external/protobuf_archive/src -g0 -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK -g0 -DHAVE_PTHREAD -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -Wno-unused-function -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -c external/protobuf_archive/src/google/protobuf/message_lite.cc -o bazel-out/host/bin/external/protobuf_archive/_objs/protobuf_lite/external/protobuf_archive/src/google/protobuf/message_lite.o)
ERROR: /home/nvidia/data_1/bazel_cache/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/jpeg/BUILD:225:1: C++ compilation of rule '@jpeg//:simd_armv7a' failed (Exit 1)
gcc: error: unrecognized command line option '-mfloat-abi=softfp'
Target //native_client:libctc_decoder_with_kenlm.so failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 19.744s, Critical Path: 7.05s
INFO: 36 processes: 36 local.
FAILED: Build did NOT complete successfully

Could you please give me an idea how to move further? I understand that @elpimous_robot has more experience with DeepSpeech on TX2, hence @elpimous_robot do you have some tips for me?

Many thanks in advance!

Okay, so first, why ad8f785459e80823a2ff4456eeb9d7220c33b9c6 ? Please use r1.6 branch, which is currently at https://github.com/mozilla/tensorflow/commit/50214731ea43f41ee036ce9af0c0c4a10185fc8f

No particular reason for the ad8f785459e80823a2ff4456eeb9d7220c33b9c6 commit.

I switched to r1.6 and I get a similar error with respect to -mfloat-abi=softfp :

SUBCOMMAND: # @jpeg//:jpeg [action 'Compiling external/jpeg/jquant2.c']
(cd /home/nvidia/data_1/bazel_cache/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda \
    CUDNN_INSTALL_PATH=/usr/lib/aarch64-linux-gnu \
    GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
    LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64: \
    PATH=/usr/local/cuda-9.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python \
    PYTHON_LIB_PATH=/usr/local/lib/python2.7/dist-packages \
    TF_CUDA_CLANG=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=3.0,3.5,3.7,5.2,5.3,6.0,6.1,6.2 \
    TF_CUDA_VERSION=9.0 \
    TF_CUDNN_VERSION=7.0.5 \
    TF_NEED_CUDA=1 \
    TF_NEED_OPENCL_SYCL=0 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -MD -MF bazel-out/arm-opt/bin/external/jpeg/_objs/jpeg/external/jpeg/jquant2.pic.d -fPIC -iquote external/jpeg -iquote bazel-out/arm-opt/genfiles/external/jpeg -iquote external/bazel_tools -iquote bazel-out/arm-opt/genfiles/external/bazel_tools -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK -O3 -O3 -w -D__ARM_NEON__ '-march=armv7-a' '-mfloat-abi=softfp' -fprefetch-loop-arrays -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -c external/jpeg/jquant2.c -o bazel-out/arm-opt/bin/external/jpeg/_objs/jpeg/external/jpeg/jquant2.pic.o)
ERROR: /home/nvidia/data_1/bazel_cache/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/jpeg/BUILD:44:1: C++ compilation of rule '@jpeg//:jpeg' failed (Exit 1)
gcc: error: unrecognized command line option '-mfloat-abi=softfp'
Target //native_client:libctc_decoder_with_kenlm.so failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 79.330s, Critical Path: 30.16s
INFO: 113 processes: 113 local.
FAILED: Build did NOT complete successfully

Can you include your bazel command line? And the Bazel version ? And ensure you don’t have some stale cache at Bazel.

sure, these are the commands I am running:

bazel clean
bazel build -s -c opt --copt=-O3 --config=cuda //native_client:libctc_decoder_with_kenlm.so

and still get the error about ‘-mfloat-abi=softfp’

SUBCOMMAND: # @jpeg//:jpeg [action 'Compiling external/jpeg/jquant2.c']
(cd /home/nvidia/data_1/bazel_cache/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda \
    CUDNN_INSTALL_PATH=/usr/lib/aarch64-linux-gnu \
    GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
    LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64: \
    PATH=/usr/local/cuda-9.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python \
    PYTHON_LIB_PATH=/usr/local/lib/python2.7/dist-packages \
    TF_CUDA_CLANG=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=3.0,3.5,3.7,5.2,5.3,6.0,6.1,6.2 \
    TF_CUDA_VERSION=9.0 \
    TF_CUDNN_VERSION=7.0.5 \
    TF_NEED_CUDA=1 \
    TF_NEED_OPENCL_SYCL=0 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -MD -MF bazel-out/arm-opt/bin/external/jpeg/_objs/jpeg/external/jpeg/jquant2.pic.d -fPIC -iquote external/jpeg -iquote bazel-out/arm-opt/genfiles/external/jpeg -iquote external/bazel_tools -iquote bazel-out/arm-opt/genfiles/external/bazel_tools -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK -O3 -O3 -w -D__ARM_NEON__ '-march=armv7-a' '-mfloat-abi=softfp' -fprefetch-loop-arrays -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -c external/jpeg/jquant2.c -o bazel-out/arm-opt/bin/external/jpeg/_objs/jpeg/external/jpeg/jquant2.pic.o)
ERROR: /home/nvidia/data_1/bazel_cache/_bazel_nvidia/6b9138338a6a5d153417b602388184c1/external/jpeg/BUILD:44:1: C++ compilation of rule '@jpeg//:jpeg' failed (Exit 1)
gcc: error: unrecognized command line option '-mfloat-abi=softfp'
Target //native_client:libctc_decoder_with_kenlm.so failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 499.665s, Critical Path: 101.23s
INFO: 518 processes: 518 local.
FAILED: Build did NOT complete successfully
bazel version
Build label: 0.15.2- (@non-git)
Build target: bazel-out/arm-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Wed Jul 18 09:16:28 2018 (1531905388)
Build timestamp: 1531905388
Build timestamp as int: 1531905388

Please use Bazel 0.10 as TensorFlow recommends for r1.6. Also make sure to nuke /home/nvidia/data_1/bazel_cache/.

Hi, I did a HOWTO, to propose a solution who worked for me !

Hope it will help you
Vincent

Your solution works like charm for me too!

I very much appreciate @lissyx and @elpimous_robot responsivenesses!

2 Likes