The same spped with cpu and with gpu

I made virtual environment for training without gpu and test model . For dataset i had 1 step per 40 seconds. After that i stop training ? made separate virtual environment for gpu training ? install there
pip3 uninstall tensorflow
pip3 install ‘tensorflow-gpu==1.15.2’
and test training with the same parameters. This case i have the same speed of training as with CPU . Why i din’t have faster training ?

python3 DeepSpeech.py --drop_source_layers 1 --alphabet_config_path ~/ASR/data-cv/alphabet.ru --save_checkpoint_dir ~/ASR/ru-output-checkpoint --load_checkpoint_dir ~/ASR/ru-release-checkpoint --train_files ~/ASR/data-cv/clips/train.csv --dev_files ~/ASR/data-cv/clips/dev.csv --test_files ~/ASR/data-cv/clips/test.csv --scorer_path ~/ASR/ru-release-checkpoint/deepspeech-0.7.0-models.scorer --train_batch_size 64 --dropout_rate 0.4 --learning_rate 0.0001 --dev_batch_size 64

Try

train_cudnn

as a flag. See more info in flags.py. If that doesn’t help post output of training. Usually it tells you whether you are using CUDA or not.

I tried flag train_cudnn

Here is the results

With GPU 1 step 1m02 sec.

 python3 DeepSpeech.py     --drop_source_layers 1     --alphabet_config_path ~/ASR/data-cv/alphabet.ru     --save_checkpoint_dir ~/ASR/ru-output-checkpoint     --load_checkpoint_dir ~/ASR/ru-release-checkpoint     --train_files   ~/ASR/data-cv/clips/train.csv     --dev_files   ~/ASR/data-cv/clips/dev.csv     --test_files  ~/ASR/data-cv/clips/test.csv --scorer_path ~/ASR/ru-release-checkpoint/deepspeech-0.7.0-models.scorer --train_batch_size 64 --dropout_rate 0.25 --learning_rate 0.00005 --dev_batch_size 64 —train_cudnn True
W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAINING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:01:02 | Steps: 1 | Loss: 293.825439 

Without GPU
1 step 0m 56 sec

python3 DeepSpeech.py     --drop_source_layers 1     --alphabet_config_path ~/ASR/data-cv/alphabet.ru     --save_checkpoint_dir ~/ASR/ru-output-checkpoint     --load_checkpoint_dir ~/ASR/ru-release-checkpoint     --train_files   ~/ASR/data-cv/clips/train.csv     --dev_files   ~/ASR/data-cv/clips/dev.csv     --test_files  ~/ASR/data-cv/clips/test.csv --scorer_path ~/ASR/ru-release-checkpoint/deepspeech-0.7.0-models.scorer --train_batch_size 64 --dropout_rate 0.5 --learning_rate 0.00005 --dev_batch_size 64 
W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAINING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:56 | Steps: 1 | Loss: 297.249207 

Why training without GPU is fater tthan with GPU?
May be i did something wrong.
I have GTX 1060 card with 3Gb memory.

You are doing sth wrong and you don’t provide much info to go with, check the info here

https://deepspeech.readthedocs.io/en/v0.7.0/TRAINING.html#installing-deepspeech-training-code-and-its-dependencies

I did
pip3 uninstall tensorflow
pip3 install ‘tensorflow-gpu==1.15.2’

all ok. Page CUDA dependency. doesn;t exist ( page not found).
How can i recieve additional info you need for help?

Really, how long are you in IT? You simply look at the link given and you’ll see. Otherwise try the search function, that allows you to find stuff on web pages …

@lissyx Looks like the link is broken in the docs.

I am 40+ in IT, but I am novice with ubuntu and GPU.

1 Like

I don’t know where you got that link, because it works if you pick it on rtd: https://deepspeech.readthedocs.io/en/v0.7.0/USING.html#cuda-dependency

Have you verified it is really using the GPU ? nvidia-smi during training.

Don’t expect too much from that, though.

6 secs of delta? My money is on “there was never a GPU used” here. Please raise log level and share more tensorflow training output: if it’s loading the GPU, you will see it.

Thanks for your replies.
nvidia-smi
Sun May 3 13:50:38 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106… Off | 00000000:01:00.0 On | N/A |
| 53% 60C P2 104W / 120W | 2890MiB / 3018MiB | 76% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1982 G /usr/lib/xorg/Xorg 370MiB |
| 0 2367 G /usr/bin/gnome-shell 231MiB |
| 0 3095 G gnome-control-center 2MiB |
| 0 3372 G /usr/lib/firefox/firefox 2MiB |
| 0 5284 G /usr/lib/firefox/firefox 2MiB |
| 0 10257 C python 2265MiB |
| 0 14734 G /usr/lib/firefox/firefox 2MiB |
±----------------------------------------------------------------------------+
(deepspeech-train-venv) (base) v@gpu:~/DeepSpeech$

I want to make some experiments with GPU 1060 to tune all procecc of train and later to work with Cristofary supercomp for training.

hm, if I click the link I provided above I get 404 as well for the CUDA dependency as it doesn’t convert the .rst into .html on Firefox :slight_smile:

in this page
https://deepspeech.readthedocs.io/en/v0.7.0/TRAINING.html#transfer-learning-new-alphabet
CUDA dependency. has bug

in page USING all ok

Can deepspeech support GPU while training with DeepSpeech.py ?
Or i need deepspeech-gpu ?

DeepSpeech 100% supports GPU training. There is no such thing as a deepspeech-gpu.

What can happen - often does happen - is that if the dependencies are not correctly set up (or for whatever other reason) the GPU is not visible to Tensorflow, then Tensorflow ‘falls back’ to CPU-only training. So that even if you are asking it to do ‘gpu’ training it actually only uses the CPU.

Getting the GPU to be recognized by Tensorflow is the first thing to figure out. Once you have that sorted you should be able to use DeepSpeech to do training, and with the GPU at the same time.

One way to determine if the GPU is being used is to run nvidia-smi at the same time as the training is happening. nvidia-smi is one way to see how much activity is going on in the GPU. If things are correctly working DeepSpeech should be giving the GPU a good work out. If its ‘falling back’ to CPU then the GPU will be at almost 0% usage while training is occuring.

So. 99% chanace is that you need to work out what dependencies are not working. Often its NVidia drivers, CUDA or CUDNN that are not working…

If you read the links here on dependencies it might be of some use or google things like ‘tensorflow use gpu on <insert your os/version>’ etc etc. Its quite a common problem that tensorflow can’t see the GPU so don’t feel bad. But its also out of scope for DeepSpeech. YOu have to solve that first then DeepSpeech should be able to make use of it.

Hope this is helpful. Best of luck!

1 Like

it’s fixed on master

Thanks for detailed recommendations. I will try it to train on GPU. But just now similar package DeepSpech from NVIDIA openseq2seq works on GPU without problem.

Please, can you just share the details we ask you? That’s your fourth reply, and you still have not provided more complete training logs with higher --log_level. We really cannot help you if you don’t share that: GPU works very well for us.

On top of everything else that’s been said here, so far you have only provided the timing of a single training step, the first one. This is not a useful benchmark because there is a bunch of setup work that happens on the first step, and it is independent of using the CPU or the GPU. You need to look at step timings over an entire epoch to get a good idea of the performance.