Jetson TX1 deepspeech inference performance is not real time

sranjeet.visteon · September 11, 2018, 2:53pm

I could get tensorflow and deepspeech built for Jetson TX1 board using Deepspeech installation on Nvidia Jetson TX2 procedure and have this use the on board GPU(256 CUDA cores) to perform inference.

Below are the inference results for various wav recordings.

nvidia@tegra-ubuntu:~/deepspeech$ ./deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio wav -t
TensorFlow: v1.5.0-17-gad8f785
DeepSpeech: v0.2.0-alpha.8-0-gcd47560
2018-09-11 13:45:25.414228: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-09-11 13:45:25.414376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9984
pciBusID: 0000:00:00.0
totalMemory: 3.89GiB freeMemory: 2.05GiB
2018-09-11 13:45:25.414444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) → (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2018-09-11 13:45:26.117175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0. Your kernel may not have been built with NUMA support.
Running on directory wav

wav/mycroft_wakeup.wav
hey my croft wake up
cpu_time_overall=16.18338 cpu_time_mfcc=0.04864 cpu_time_infer=16.13473
wav/how_are_you.wav
how are you
cpu_time_overall=4.97402 cpu_time_mfcc=0.00675 cpu_time_infer=4.96727
wav/can_you_tell_me_what_time_is_it.wav
can you tell me what time is it
cpu_time_overall=7.59726 cpu_time_mfcc=0.00927 cpu_time_infer=7.58799
wav/it_was_his_heart_that_would_tell_him_where_his_treasure_was_hidden.wav
it was his heart that would tell him where his treasure was hid
cpu_time_overall=13.69008 cpu_time_mfcc=0.01835 cpu_time_infer=13.67173
wav/weather_mycroft.wav
whats the weather next week a crop
cpu_time_overall=16.14883 cpu_time_mfcc=0.02286 cpu_time_infer=16.12596

From what I could see the inference times are considerably high, it is 3x more than the original audio recording. for example “how are you” recording is only 1.5 secs, but the inference is about 5 secs. Does anyone have a better inference performance on a related target HW or see ways to improve the performance?

lissyx · September 11, 2018, 3:04pm

No, that’s what we expect with that model, mostly. Newer model should help, but I did not have time yet to play on some ARM boards, and I don’t have access to any Jetson TX1.

sranjeet.visteon · September 11, 2018, 3:59pm

@lissyx Thanks for the feedback. Is there an updated model beyond 0.1.1 pretrained model which I can use to get a better inference?

lissyx · September 11, 2018, 4:19pm

Nothing released yet, though you can train your own one if you want to verify performances improvements. We also hope to get something out of TFLite and other quantization tooling for TensorFlow, but the bidirectionnal LSTM component has always been an issue :/.

@reuben is currently focusing on training, so as soon as he has something we’ll be able to do betas.

Topic		Replies	Views
Inference time on V100 seems slow DeepSpeech	13	3173	March 13, 2018
DeepSpeech benchmarking / Shorten inference time DeepSpeech	16	5741	February 14, 2018
GPU much slower DeepSpeech	9	1894	February 25, 2018
Video and benchmarking results DeepSpeech	15	1696	February 6, 2020
Using deepspeech-rs with GPU DeepSpeech	12	987	February 20, 2020

Jetson TX1 deepspeech inference performance is not real time

Related topics