Jetson TX1 deepspeech inference performance is not real time

I could get tensorflow and deepspeech built for Jetson TX1 board using Deepspeech installation on Nvidia Jetson TX2 procedure and have this use the on board GPU(256 CUDA cores) to perform inference.

Below are the inference results for various wav recordings.

nvidia@tegra-ubuntu:~/deepspeech$ ./deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio wav -t
TensorFlow: v1.5.0-17-gad8f785
DeepSpeech: v0.2.0-alpha.8-0-gcd47560
2018-09-11 13:45:25.414228: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-09-11 13:45:25.414376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9984
pciBusID: 0000:00:00.0
totalMemory: 3.89GiB freeMemory: 2.05GiB
2018-09-11 13:45:25.414444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) → (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2018-09-11 13:45:26.117175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0. Your kernel may not have been built with NUMA support.
Running on directory wav

wav/mycroft_wakeup.wav
hey my croft wake up
cpu_time_overall=16.18338 cpu_time_mfcc=0.04864 cpu_time_infer=16.13473
wav/how_are_you.wav
how are you
cpu_time_overall=4.97402 cpu_time_mfcc=0.00675 cpu_time_infer=4.96727
wav/can_you_tell_me_what_time_is_it.wav
can you tell me what time is it
cpu_time_overall=7.59726 cpu_time_mfcc=0.00927 cpu_time_infer=7.58799
wav/it_was_his_heart_that_would_tell_him_where_his_treasure_was_hidden.wav
it was his heart that would tell him where his treasure was hid
cpu_time_overall=13.69008 cpu_time_mfcc=0.01835 cpu_time_infer=13.67173
wav/weather_mycroft.wav
whats the weather next week a crop
cpu_time_overall=16.14883 cpu_time_mfcc=0.02286 cpu_time_infer=16.12596

From what I could see the inference times are considerably high, it is 3x more than the original audio recording. for example “how are you” recording is only 1.5 secs, but the inference is about 5 secs. Does anyone have a better inference performance on a related target HW or see ways to improve the performance?

No, that’s what we expect with that model, mostly. Newer model should help, but I did not have time yet to play on some ARM boards, and I don’t have access to any Jetson TX1.

@lissyx Thanks for the feedback. Is there an updated model beyond 0.1.1 pretrained model which I can use to get a better inference?

Nothing released yet, though you can train your own one if you want to verify performances improvements. We also hope to get something out of TFLite and other quantization tooling for TensorFlow, but the bidirectionnal LSTM component has always been an issue :/.

@reuben is currently focusing on training, so as soon as he has something we’ll be able to do betas.