I made virtual environment for training without gpu and test model . For dataset i had 1 step per 40 seconds. After that i stop training ? made separate virtual environment for gpu training ? install there
pip3 uninstall tensorflow
pip3 install ‘tensorflow-gpu==1.15.2’
and test training with the same parameters. This case i have the same speed of training as with CPU . Why i din’t have faster training ?
python3 DeepSpeech.py --drop_source_layers 1 --alphabet_config_path ~/ASR/data-cv/alphabet.ru --save_checkpoint_dir ~/ASR/ru-output-checkpoint --load_checkpoint_dir ~/ASR/ru-release-checkpoint --train_files ~/ASR/data-cv/clips/train.csv --dev_files ~/ASR/data-cv/clips/dev.csv --test_files ~/ASR/data-cv/clips/test.csv --scorer_path ~/ASR/ru-release-checkpoint/deepspeech-0.7.0-models.scorer --train_batch_size 64 --dropout_rate 0.25 --learning_rate 0.00005 --dev_batch_size 64 —train_cudnn True
W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAINING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:01:02 | Steps: 1 | Loss: 293.825439
Without GPU
1 step 0m 56 sec
python3 DeepSpeech.py --drop_source_layers 1 --alphabet_config_path ~/ASR/data-cv/alphabet.ru --save_checkpoint_dir ~/ASR/ru-output-checkpoint --load_checkpoint_dir ~/ASR/ru-release-checkpoint --train_files ~/ASR/data-cv/clips/train.csv --dev_files ~/ASR/data-cv/clips/dev.csv --test_files ~/ASR/data-cv/clips/test.csv --scorer_path ~/ASR/ru-release-checkpoint/deepspeech-0.7.0-models.scorer --train_batch_size 64 --dropout_rate 0.5 --learning_rate 0.00005 --dev_batch_size 64
W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAINING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:56 | Steps: 1 | Loss: 297.249207
Why training without GPU is fater tthan with GPU?
May be i did something wrong.
I have GTX 1060 card with 3Gb memory.
Really, how long are you in IT? You simply look at the link given and you’ll see. Otherwise try the search function, that allows you to find stuff on web pages …
@lissyx Looks like the link is broken in the docs.
Have you verified it is really using the GPU ? nvidia-smi during training.
Don’t expect too much from that, though.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
6 secs of delta? My money is on “there was never a GPU used” here. Please raise log level and share more tensorflow training output: if it’s loading the GPU, you will see it.
DeepSpeech 100% supports GPU training. There is no such thing as a deepspeech-gpu.
What can happen - often does happen - is that if the dependencies are not correctly set up (or for whatever other reason) the GPU is not visible to Tensorflow, then Tensorflow ‘falls back’ to CPU-only training. So that even if you are asking it to do ‘gpu’ training it actually only uses the CPU.
Getting the GPU to be recognized by Tensorflow is the first thing to figure out. Once you have that sorted you should be able to use DeepSpeech to do training, and with the GPU at the same time.
One way to determine if the GPU is being used is to run nvidia-smi at the same time as the training is happening. nvidia-smi is one way to see how much activity is going on in the GPU. If things are correctly working DeepSpeech should be giving the GPU a good work out. If its ‘falling back’ to CPU then the GPU will be at almost 0% usage while training is occuring.
So. 99% chanace is that you need to work out what dependencies are not working. Often its NVidia drivers, CUDA or CUDNN that are not working…
If you read the links here on dependencies it might be of some use or google things like ‘tensorflow use gpu on <insert your os/version>’ etc etc. Its quite a common problem that tensorflow can’t see the GPU so don’t feel bad. But its also out of scope for DeepSpeech. YOu have to solve that first then DeepSpeech should be able to make use of it.
Hope this is helpful. Best of luck!
1 Like
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
17
Thanks for detailed recommendations. I will try it to train on GPU. But just now similar package DeepSpech from NVIDIA openseq2seq works on GPU without problem.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
19
Please, can you just share the details we ask you? That’s your fourth reply, and you still have not provided more complete training logs with higher --log_level. We really cannot help you if you don’t share that: GPU works very well for us.
On top of everything else that’s been said here, so far you have only provided the timing of a single training step, the first one. This is not a useful benchmark because there is a bunch of setup work that happens on the first step, and it is independent of using the CPU or the GPU. You need to look at step timings over an entire epoch to get a good idea of the performance.