How to build gpu version from source

Compiled from source with tensor flow configure having all nvidia cuda options turned on, how can I then get deepspeech to use the cuda version?

There’s nothing specific to do, as long as you follow the TensorFlow CUDA steps and thus build with --config=cuda.

Where do I add --config=cuda, on the bazel command line? that wasn’t specified anywhere going to try with that now, takes an awfully long time to build though.

That doesn’t seem to make any difference either, it built a lot of things for gpu, yet when i run it it doesn’t use it. Is there an option at runtime that makes native_client/deepspeech use the gpu?

No, --config=cuda. But you need to have setup everything for using CUDA.

We do refer to TensorFlow’s docs for specifics of each platforms: https://github.com/mozilla/DeepSpeech/blob/master/native_client/README.md#building, CUDA is documented by TensorFlow.

Is there any reason you would need to rebuild from source ? Prebuilt binaries not working ? https://tools.taskcluster.net/index/project.deepspeech.deepspeech.native_client.master/gpu

don’t have avx extensions :frowning: i’m going to try with cuda 8.0 and cudnn 6 since 9.1 and 7 aren’t working. Going to use tf 1.5 branch instead of master, this the best one to use?

Ok, no AVX :(. Best one for ? CUDA 8.0 ? I’m not so sure. But there is really no magic, bazel build --config=cuda [...] //native_client:libdeepspeech.so and you have it build with CUDA.

does it fallback to cpu if the gpu isn’t found? is there a way I can check that isn’t happening?

Building TensorFlow with CUDA links it against CUDA:

  • runtime stdout/stderr will show it’s using the GPU
  • if you are missing the CUDA libs, it will not even fallback to CPU since linker will complain.

@Ironvil FTR here is a sample output of running from the link above:

$ time ./deepspeech ../models/output_graph.pb ../models/alphabet.txt ../audio/ -t
TensorFlow: v1.6.0-9-g236f83e
DeepSpeech: v0.1.1-44-gd68fde8
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-03-17 23:58:25.631405: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-03-17 23:58:25.839521: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-17 23:58:25.839879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.797
pciBusID: 0000:41:00.0
totalMemory: 7.92GiB freeMemory: 7.47GiB
2018-03-17 23:58:25.839894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-17 23:58:25.965245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7230 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:41:00.0, compute capability: 6.1)
Running on directory ../audio/
> ../audio//2830-3980-0043.wav
experience proves tis
cpu_time_overall=2.29062 cpu_time_mfcc=0.00429 cpu_time_infer=2.28632
> ../audio//4507-16021-0012.wav
why should one halt on the way
cpu_time_overall=0.57249 cpu_time_mfcc=0.00561 cpu_time_infer=0.56689
> ../audio//8455-210777-0068.wav
your powr is sufficient i said
cpu_time_overall=0.52592 cpu_time_mfcc=0.00427 cpu_time_infer=0.52165

real	0m4,223s
user	0m2,881s
sys	0m1,199s

@Ironvil And ldd:

$ ldd deepspeech libdeepspeech.so 
deepspeech:
	linux-vdso.so.1 (0x00007ffe8d8b3000)
	libcudart.so.9.0 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcudart.so.9.0 (0x00007f54c70e1000)
	libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f54c6541000)
	libdeepspeech.so => /home/alexandre/tmp/deepspeech/gpu/./libdeepspeech.so (0x00007f54afd1b000)
	libdeepspeech_utils.so => /home/alexandre/tmp/deepspeech/gpu/./libdeepspeech_utils.so (0x00007f54afb16000)
	libsox.so.2 => /usr/lib/x86_64-linux-gnu/libsox.so.2 (0x00007f54af881000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f54af4fc000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f54af169000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f54aef51000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f54aeb97000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f54ae993000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f54ae775000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f54ae56d000)
	libnvidia-fatbinaryloader.so.390.42 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.390.42 (0x00007f54ae321000)
	libcusolver.so.9.0 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcusolver.so.9.0 (0x00007f54a9726000)
	libcublas.so.9.0 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcublas.so.9.0 (0x00007f54a62f0000)
	libcudnn.so.7 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcudnn.so.7 (0x00007f5494e59000)
	libcufft.so.9.0 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcufft.so.9.0 (0x00007f548cdb8000)
	libcurand.so.9.0 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcurand.so.9.0 (0x00007f5488e54000)
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f5488c25000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f54c734e000)
	libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f5488a1b000)
	libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007f54887e8000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f54885ce000)
	libmagic.so.1 => /usr/lib/x86_64-linux-gnu/libmagic.so.1 (0x00007f54883ac000)
	libgsm.so.1 => /usr/lib/x86_64-linux-gnu/libgsm.so.1 (0x00007f548819f000)
libdeepspeech.so:
	linux-vdso.so.1 (0x00007ffc1a97b000)
	libcusolver.so.9.0 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcusolver.so.9.0 (0x00007ffa86564000)
	libcublas.so.9.0 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcublas.so.9.0 (0x00007ffa8312e000)
	libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007ffa8258e000)
	libcudnn.so.7 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcudnn.so.7 (0x00007ffa710f7000)
	libcufft.so.9.0 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcufft.so.9.0 (0x00007ffa69056000)
	libcurand.so.9.0 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcurand.so.9.0 (0x00007ffa650f2000)
	libcudart.so.9.0 => /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/CUDA/lib64/libcudart.so.9.0 (0x00007ffa64e85000)
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007ffa64c56000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ffa64a52000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ffa646bf000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffa644a1000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ffa6411c000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ffa63f04000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffa63b4a000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ffaa1985000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ffa63942000)
	libnvidia-fatbinaryloader.so.390.42 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.390.42 (0x00007ffa636f6000)

HP-Z600-Workstation:~/development/gitrepos/gpuDeeps/DeepSpeech/native_client$ ldd deepspeech libdeepspeech.so
deepspeech:
linux-vdso.so.1 => (0x00007ffeff1be000)
libdeepspeech.so => /usr/local/lib/libdeepspeech.so (0x00007f1751639000)
libdeepspeech_utils.so => /usr/local/lib/libdeepspeech_utils.so (0x00007f175393a000)
libsox.so.3 => /usr/local/lib/libsox.so.3 (0x00007f17513af000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f175102d000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1750d24000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1750b0e000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1750744000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1750540000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1750323000)
/lib64/ld-linux-x86-64.so.2 (0x00007f175373e000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f1750101000)
libdeepspeech.so:
ldd: ./libdeepspeech.so: No such file or directory

HP-Z600-Workstation:~/development/gitrepos/gpuDeeps/tensorflow/native_client$ time ./deepspeech ~/deepspeech/models/output_graph.pb ~/deepspeech/models/alphabet.txt ~/deepspeech/audio/ -t
TensorFlow: v1.6.0-rc1-1443-g8cbf4dd
DeepSpeech: v0.1.1-44-gd68fde8
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
Running on directory /home/jacobmh/deepspeech/audio/

/home/jacobmh/deepspeech/audio//2830-3980-0043.wav
experience proves tis
cpu_time_overall=17.84354 cpu_time_mfcc=0.00434 cpu_time_infer=17.83920
/home/jacobmh/deepspeech/audio//8455-210777-0068.wav
your powr is sufficient i said
cpu_time_overall=9.12456 cpu_time_mfcc=0.00542 cpu_time_infer=9.11913
/home/jacobmh/deepspeech/audio//4507-16021-0012.wav
why should one halt on the way
cpu_time_overall=10.29592 cpu_time_mfcc=0.00568 cpu_time_infer=10.29024

real 0m30.819s
user 0m29.590s
sys 0m9.068s

this is the results, seems like its trying to link to the wrong libraries

HP-Z600-Workstation:~/development/gitrepos/gpuDeeps/tensorflow/native_client$ make deepspeech
c++ -o deepspeech pkg-config --cflags sox client.cc -Wl,–no-as-needed -Wl,-rpath,$ORIGIN -L/home/jacobmh/development/gitrepos/gpuDeeps/tensorflow/bazel-bin/native_client -ldeepspeech -ldeepspeech_utils pkg-config --libs sox

Why are they here? They should be in your bazel’s dir. You copied them by hand ? Please ldd /usr/local/lib/libdeepspeech.so, but given the output, I’d bet it’s not linked against CUDA. This is wrong, you need to verify again your configure and build steps, something is not good.

Right, gpu version now works, after using PREFIX=/usr/local sudo make install

just to get multithreading working now. Still seems pretty quick on this gpu though

Makes sense, if you did install, then obviously you need to ensure you install new build, because it takes precedence :).