My RTX 4000 GPU is not used fully while i pre-trained the deepspeech 0.5.1 models

Hi Guys,

I am pre-training the deepspeech 0.5.1 models from the downloaded checkpoint.

1). Cloned Deepspeech 0.5.1 and cherry pick git cherry-pick 007e512
2). Downloaded Deepspeech.py given by How to find the which file is making loss inf (to find the file which making training loss)
3). Downloaded deepspeech 0.5.1 checkpoint.
4). Downloaded Common voice Mozilla corpus data.
4). Tensorflow 1.14.0 GPU for faster GPU.

Models is training too slow.
Here is the command

python3 -u DeepSpeech.py
–n_hidden 2048
–epochs 3
–checkpoint_dir data/checkpoint/
–train_files data/corpus/clips/train.csv
–dev_files data/corpus/clips/dev.csv
–test_files data/corpus/clips/test.csv
–train_batch_size 8
–dev_batch_size 10
–test_batch_size 10
–dropout_rate 0.15
–lm_alpha 0.75
–lm_beta 1.85
–learning_rate 0.0001
–lm_binary_path data/originalLmBinary/lm.binary
–lm_trie_path data/originalLmBinary/trie
–export_dir data/export/

My System Configuration::
1).Quadro RTX 4000 (8 GB Ram) --> 1
2). 500SSD
3). Ubuntu 18.04
4). CUDA 10.0 and CuDNN v7.5
5). Nvidia Driver 435.

Here is the nvidia-smi output.
Ignore CUDA version 10.1 on here it is wrong.
I am using two GPU but i think RTX 4000 is running but only 4% or 15% while checking not fully used. (checked with nvtop)

Fri Sep 6 17:44:50 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 Off | 00000000:01:00.0 On | N/A |
| 30% 44C P8 11W / 125W | 645MiB / 7981MiB | 1% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 105… Off | 00000000:02:00.0 Off | N/A |
| 0% 37C P8 N/A / 75W | 2MiB / 4040MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1314 G /usr/lib/xorg/Xorg 39MiB |
| 0 1363 G /usr/bin/gnome-shell 55MiB |
| 0 1672 G /usr/lib/xorg/Xorg 222MiB |
| 0 1816 G /usr/bin/gnome-shell 136MiB |
| 0 2223 G …equest-channel-token=222207177073321633 141MiB |
| 0 4780 G …pareRendererForSitePerProcess --disable 47MiB |
±----------------------------------------------------------------------------+

Please correct me if i am wrong.

There’s no python process using the GPU in your nvidia-smi output, so something is wrong with your tensorflow-gpu install. It’s not using the GPU at all.

Hi @reuben,

Sorry i have took nvidia-smi without running the deepspeech command.

I am using virtualenv and i have installed tensorflow-gpu 1.14.0.
Note: I have note installed tensorflow-gpu on core of the system pip.
Here is the output of nvidia-smi

Fri Sep 6 18:38:26 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 Off | 00000000:01:00.0 On | N/A |
| 30% 46C P8 11W / 125W | 866MiB / 7981MiB | 2% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 105… Off | 00000000:02:00.0 Off | N/A |
| 0% 38C P8 N/A / 75W | 55MiB / 4040MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1314 G /usr/lib/xorg/Xorg 39MiB |
| 0 1363 G /usr/bin/gnome-shell 55MiB |
| 0 1672 G /usr/lib/xorg/Xorg 249MiB |
| 0 1816 G /usr/bin/gnome-shell 187MiB |
| 0 2223 G …equest-channel-token=222207177073321633 151MiB |
| 0 7119 G /opt/teamviewer/tv_bin/TeamViewer 25MiB |
| 0 15865 C python 93MiB |
| 0 26111 G …quest-channel-token=9205947884022005057 51MiB |
| 1 15865 C python 43MiB |
±----------------------------------------------------------------------------+

Now they’re listed as being in a low power state and with only small amounts of memory allocated, which doesn’t make any sense either. Probably your CUDA setup is broken. I’ve never seen this before. Try reinstalling all the dependencies from scratch? I’d specify CUDA_VISIBLE_DEVICES=“0” when training so that it only uses the Quadro RTX 4000, because otherwise you’ll be forced to use a lower batch size to accommodate for the GTX 1050 and thus will under utilize the beefier GPU.

1 Like

Hi @reuben,

I wrongly installed 10.1 instead of 10.0 so reinstalled 10.0, CUDNN 7.5.1 and trained the model looks like GPU is utilised thank you.

I have followed this link --> https://gist.github.com/bogdan-kulynych/f64eb148eeef9696c70d485a76e42c3a

My Doubt is
I have trained existing pre-trained model with common voice dataset, i have used first 5000 steps and trained. I got successful output and model exported.

I0907 17:02:50.652766 140018494904128 graph_util_impl.py:364] Converted 12 variables to const ops.
I Models exported at /home/karthik/speech/DeepSpeech/data/export/

But the output_graph.pb remains same size of existing models memory 188.9 MB.
My question is that the size of output_graph.pb will get increased after i trained the existing model with new dataset?

I’m sure I already replied to that question in another topic.

1 Like