GPU is not used in 'Decoding predictions...'

I’m trying to train by using thchs-30 dataset.
During decoding predictions, I find that GPU usage is always 0%.

GPU Usage:

kelvin_pfp@tensorflow-1-vm:~/DeepSpeech$ gpustat
tensorflow-1-vm Tue Jan 8 14:34:21 2019
[0] Tesla V100-SXM2-16GB | 31’C, 0 % | 15467 / 16130 MB | kelvin_pfp(15393M)

CPU Usage:

kelvin_pfp@tensorflow-1-vm:~/DeepSpeech$ mpstat
Linux 4.9.0-8-amd64 (tensorflow-1-vm) 01/08/2019 x86_64 (12 CPU)
02:36:09 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
02:36:09 PM all 68.19 0.00 2.92 0.02 0.00 0.01 0.00 0.00 0.00 28.86

Command I used to run:

python3 -u DeepSpeech.py
–train_files /home/kelvin_pfp/data_thchs30/gencsv/thchs30-train.csv
–dev_files /home/kelvin_pfp/data_thchs30/gencsv/thchs30-dev.csv
–test_files /home/kelvin_pfp/data_thchs30/gencsv/thchs30-test.csv
–train_batch_size 100
–dev_batch_size 100
–test_batch_size 25
–n_hidden 512
–epoch 30
–validation_step 1
–early_stop True
–earlystop_nsteps 6
–estop_mean_thresh 0.1
–estop_std_thresh 0.1
–dropout_rate 0.22
–learning_rate 0.0001
–report_count 100
–log_level 0
–summary_secs 3
–checkpoint_secs 900
–max_to_keep 100
–beam_width 128
–export_dir /home/kelvin_pfp/data_thchs30/results/model_export/
–checkpoint_dir /home/kelvin_pfp/data_thchs30/results/checkout/
–alphabet_config_path /home/kelvin_pfp/data_thchs30/gencsv/thchs30-alphabet.txt
–lm_binary_path /home/kelvin_pfp/data_thchs30/gencsv/lm.binary
–lm_trie_path /home/kelvin_pfp/data_thchs30/gencsv/trie
“$@”

Log:

2019-01-08 13:46:46.046188: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this
TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-08 13:46:46.803005: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from Sy
sFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-08 13:46:46.803432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:04.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2019-01-08 13:46:46.803464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-08 13:46:47.206055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor wi
th strength 1 edge matrix:
2019-01-08 13:46:47.206113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-08 13:46:47.206121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-08 13:46:47.206419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:G
PU:0 with 14847 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:04.0, compute capa
bility: 7.0)
D Starting coordinator…
D Coordinator started. Thread id 140160971659008
Preprocessing [’/home/kelvin_pfp/data_thchs30/gencsv/thchs30-train.csv’]
Preprocessing done
Preprocessing [’/home/kelvin_pfp/data_thchs30/gencsv/thchs30-dev.csv’]
Preprocessing done
2019-01-08 13:48:40.214389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-08 13:48:40.214449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor wi
th strength 1 edge matrix:
2019-01-08 13:48:40.214456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-08 13:48:40.214462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-08 13:48:40.214753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:loca
lhost/replica:0/task:0/device:GPU:0 with 14847 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus
id: 0000:00:04.0, compute capability: 7.0)
D Starting queue runners…
D Queue runners started.
D step: 6435
D epoch: 64
D target epoch: 30
D steps per epoch: 100
D number of batches in train set: 100
D batches per job: 1
D batches per step: 1
D number of jobs in train set: 100
D number of jobs already trained in first epoch: 35
D Epochs - running: 0, done: 0
D Closing queues…
D Queues closed.
D Session closed.
D Stopping coordinator…
D Coordinator stopped.
Preprocessing [’/home/kelvin_pfp/data_thchs30/gencsv/thchs30-test.csv’]
Preprocessing done
2019-01-08 13:49:10.563406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-08 13:49:10.563456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor wi
th strength 1 edge matrix:
2019-01-08 13:49:10.563463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-08 13:49:10.563468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-08 13:49:10.563573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:loca
lhost/replica:0/task:0/device:GPU:0 with 14847 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus
id: 0000:00:04.0, compute capability: 7.0)
Computing acoustic model predictions…
100% (99 of 99) |#################################################################| Elapsed Time: 0:00:43 Time: 0:00:43
Decoding predictions…
2% (2 of 99) |# | Elapsed Time: 0:38:14 ETA: 1 day, 6:55:56

Before decoding, I confirmed that GPU is utilized and the progress is fast (about 2mins for 1 epoch). The ETA of decoding is about 1 day 7 hrs. Is this an expected behavior? Is there any to utilize GPU in decoding predictions?

Not for now, no, we don’t have a GPU implementation of the decoder.