Doesn`t use GPU while training, but during recognition it uses one

makar.troyan · April 23, 2018, 4:12pm

Hi guys. Im just starting my way in ML and using DeepSpeech. Thank Mozzila for your work - its really great project, but I had some problems with using it. Maybe someone will help me?
The main problem is when I use Deepspeech.py to train а model on the big datasets (like LibriSpeech, CV and others) it stop on the “I STARTING Optimization” moment and do nothing else, once I waited 1,5 days and it loaded the CPU to 100% but didnt print anything except "I STARTING Optimization". On a smaller dataset like ldc93s1 it finish well but dontt use GPU while training too.
And also there is the problem of long “words” without spaces after recognition with pre-trained model.
P.S. I installed and used everything by step-by-step instructions from the githab.
P.P.S During recognition by ./deepspeech it correctly uses the GPU.

lissyx · April 23, 2018, 4:49pm

Have you installed tensorflow-gpu package only in your virtual environment ?

makar.troyan · April 23, 2018, 9:12pm

I have tensorflow==1.6.0 & tensorflow-gpu==1.6.0 simultaneously. I tried to use only with tensorflow-gpu but got an error like ModuleNotFoundError: No module named 'tensorflow.python'. Should I changetensorflowtotensorflow-gpu` manually in code or do something else?

lissyx · April 24, 2018, 6:24am

You should uninstall everything and reinstall only tensorflow-gpu. And describe all your setup steps if you still have the ModuleNotFoundError.

makar.troyan · April 24, 2018, 2:03pm

I run new clear Ubuntu 16.04 with python 3.5 by default, change it to 3.6, install nvidia driverds, cuda9, cudnn7, git lfs, sox(with support mp3) then:
git clone https://github.com/mozilla/DeepSpeech
cd DeepSpeech/
wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz | tar xvfz -
sudo pip3 install -r requirements.txt
python3 util/taskcluster.py --target . --arch gpu
sudo pip3 uninstall tensorflow
sudo pip3 install 'tensorflow-gpu==1.6.0'
after it try to run ./bin/run-ldc93s1.sh and get a message
+ [ ! -f DeepSpeech.py ]
+ [ ! -f data/ldc93s1/ldc93s1.csv ]
+ echo Downloading and preprocessing LDC93S1 example data, saving in ./data/ldc93s1.
Downloading and preprocessing LDC93S1 example data, saving in ./data/ldc93s1.
+ python -u bin/import_ldc93s1.py ./data/ldc93s1
Traceback (most recent call last): File "bin/import_ldc93s1.py", line 12, in <module> from tensorflow.contrib.learn.python.learn.datasets import base ImportError: No module named tensorflow.contrib.learn.python.learn.datasets

lissyx · April 24, 2018, 2:05pm

Please follow the documentation and setup a virtualenv properly. You should never ever have to sudo pip3 install, you are going to run into issues oever and over.

makar.troyan · April 24, 2018, 3:07pm

Oh! It really helped me. Thank you very much!
I have one question left: if I launch it on the multi-GPU machine, do I need to do something extra for the distribution of training or it will do it automatically?

lissyx · April 24, 2018, 5:12pm

If you have several NVIDIA GPUs on one system, it should pick them automagically. And you can play with the CUDA_VISIBLE_DEVICES environment variable to control the devices that are visible by your process.

makar.troyan · May 7, 2018, 10:25am

Hi, I have one more question about the using of the GPU.
I started training with the following parameters:
DeepSpeech.py \
--initialize_from_frozen_model models/output_graph.pb \
--learning_rate 0.0005 \
--dropout_rate 0.2367 \
--epoch 1 \
--display_step 1 \
--validation_step 1 \
--fulltrace \
--checkpoint_dir tests/checkpoint_voip/ \
--checkpoint_secs 60 \
--export_dir tests/export_voip/ \
--summary_dir tests/tensorboard/ \
--summary_secs 120 \
--dev_batch_size 16 \
--dev_files data/voip_en/voip_en-wav-dev.csv \
--test_batch_size 16 \
--test_files data/voip_en/voip_en-wav-test.csv \
--train_batch_size 32 \
--train_files data/voip_en/voip_en-wav-train.csv

In this dataset audio records have 5-10 seconds length.
I have a 1-GPU machine.
Is it OK that the GPU is loaded at 98-100% only 4 seconds and then 26 seconds at 0% and forth and so on? (I use nvtop for the GPU monitoring)

lissyx · May 7, 2018, 10:30am

It suggests you have something that runs on the CPU :-). It might be your data fetching that is inappropriately dimensioned: that depends on your training test and your GPU. Without more details on both, it’s unlikely we can help more.

makar.troyan · May 7, 2018, 3:15pm

GPU (Tesla K80) : just one DeepSpeech.py process.
CPU (Intel Xeon E5-2686 v4 Broadwell) - 22 subprocesses of DeepSpeech.py, tensorboard and some system’s processes.
About dataset: I found the average record duration - 3.44 seconds (in the training part 35662 records), it’s stored on SSD
On this video you can see two moments when it is loaded at 100%.
I have no more ideas what details to add

lissyx · May 7, 2018, 3:31pm

Only one K80 ? That’s not a lot of power, as much as I remember. Small audio files? Maybe you need more into your batch.

lissyx · May 7, 2018, 3:45pm

Wait @makar.troyan I missed that you set display and validation step to one. Then what you see is likely WER computation taking CPU.

makar.troyan · May 7, 2018, 4:00pm

On average, one audio file is 110kb and 3,44 sec duration and has transcription like "i would like information on a mediterranean restaurant" or "m looking for a pub and it must have an internet connection and a tv".
About display and validation step: does it take a place during the epoch? I thought that it is computed after a certain epoch.

lissyx · May 7, 2018, 4:25pm

Your command line above sets those steps to “1”, so it’s happening after every epoch …

makar.troyan · May 7, 2018, 5:47pm

Yes, of course. But the situation like on the video happens during an epoch. It looks like a life cycle of single batch, but I’m not sure. Is WER computed after every batch and if so, can it take so much time ?

lissyx · May 7, 2018, 6:42pm

Please refrain from using videos or screenshots, it’s not readable, and heavy. I can only comment on the command line you documented earlier. And my comment holds

reuben · May 7, 2018, 7:11pm

Yes, it happens after every batch and can delay things enough to underutilize the GPU. You can set display_step to a higher number so that you don’t calculate WER reports on every epoch.

reuben · May 7, 2018, 7:16pm

Also, as lissyx has already mentioned, make sure you’re using batch sizes that are as high as your GPU RAM can handle. In order to quickly find out if any given batch size is too high, you can look for the sort_values call in util/feeding.py and change the ascending parameter to False, so that longer samples are used first, then you’ll get OOMs faster and can search for the highest batch size that works for you. Make sure you flip it back again when training for real though

makar.troyan · May 7, 2018, 7:55pm

I understood! Thank you very much, guys! I’ll try to play with the batch size and won’t get carried away with validation and display step ))

Topic		Replies	Views
Deepspeech does not seem to use gpu while training, however does use it when using native-client DeepSpeech	17	1787	November 19, 2020
Problems on running Deepspeech on GPU DeepSpeech	4	2087	September 24, 2019
The same spped with cpu and with gpu DeepSpeech	42	2277	May 3, 2020
Long Training Time DeepSpeech	13	630	April 14, 2020
Using GPU to train a french deepspeech DeepSpeech	10	1999	May 22, 2019

Doesn`t use GPU while training, but during recognition it uses one

Related topics