I could get tensorflow and deepspeech built for Jetson TX1 board using Deepspeech installation on Nvidia Jetson TX2 procedure and have this use the on board GPU(256 CUDA cores) to perform inference.
Below are the inference results for various wav recordings.
nvidia@tegra-ubuntu:~/deepspeech$ ./deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio wav -t
TensorFlow: v1.5.0-17-gad8f785
DeepSpeech: v0.2.0-alpha.8-0-gcd47560
2018-09-11 13:45:25.414228: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-09-11 13:45:25.414376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9984
pciBusID: 0000:00:00.0
totalMemory: 3.89GiB freeMemory: 2.05GiB
2018-09-11 13:45:25.414444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) → (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2018-09-11 13:45:26.117175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/replica:0/task:0/device:GPU:0, defaulting to 0. Your kernel may not have been built with NUMA support.
Running on directory wav
wav/mycroft_wakeup.wav
hey my croft wake up
cpu_time_overall=16.18338 cpu_time_mfcc=0.04864 cpu_time_infer=16.13473
wav/how_are_you.wav
how are you
cpu_time_overall=4.97402 cpu_time_mfcc=0.00675 cpu_time_infer=4.96727
wav/can_you_tell_me_what_time_is_it.wav
can you tell me what time is it
cpu_time_overall=7.59726 cpu_time_mfcc=0.00927 cpu_time_infer=7.58799
wav/it_was_his_heart_that_would_tell_him_where_his_treasure_was_hidden.wav
it was his heart that would tell him where his treasure was hid
cpu_time_overall=13.69008 cpu_time_mfcc=0.01835 cpu_time_infer=13.67173
wav/weather_mycroft.wav
whats the weather next week a crop
cpu_time_overall=16.14883 cpu_time_mfcc=0.02286 cpu_time_infer=16.12596
From what I could see the inference times are considerably high, it is 3x more than the original audio recording. for example “how are you” recording is only 1.5 secs, but the inference is about 5 secs. Does anyone have a better inference performance on a related target HW or see ways to improve the performance?