Check training on GPU

reza_ali · June 10, 2020, 5:45pm

Hi
How can check that training using GPU?

baconator · June 10, 2020, 6:14pm

nvidia-smi, check the processes running, one should be taking a lot of memory.

reza_ali · June 10, 2020, 6:19pm

The process not set not GPU?
How can set it on GPU?
There isn’t any script that show training loss function chart … ?

baconator · June 10, 2020, 8:19pm

If you have the right software and hardware installed, it will automatically use the gpu.

lissyx · June 10, 2020, 8:20pm

have you read the doc and properly setup tensorflow-gpu ? since r1.15, it will default to CPU if it fails to find CUDA …

lissyx · June 10, 2020, 8:21pm

Please try and run with TF_CPP_MIN_VLOG_LEVEL=1 env variable to more easily investigate on tensorflow side …

reza_ali · June 11, 2020, 7:05am

thanks
1- I check it my tensorflow version was cpu I install gpu version. that’s OK.

2- I get this error when running code:
./bin/run-ldc93s1.sh
+ [ ! -f DeepSpeech.py ]
+ [ ! -f data/ldc93s1/ldc93s1.csv ]
+ [ -d ]
+ python -c from xdg import BaseDirectory as xdg; print(xdg.save_data_path(“deepspeech/ldc93s1”))
+ checkpoint_dir=/home/sokhan/.local/share/deepspeech/ldc93s1
+ export CUDA_VISIBLE_DEVICES=0
+ python -u DeepSpeech.py --noshow_progressbar --train_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1
–test_batch_size 1 --n_hidden 100 --epochs 200 --checkpoint_dir /home/sokhan/.local/share/deepspeech/ldc93s1
*swig/python detected a memory leak of type 'Alphabet ', no destructor found.

and at the end of log it print this lines:

I Test epoch...
swig/python detected a memory leak of type 'Output *', **no destructor found**.
swig/python detected a memory leak of type 'std::vector< Output,std::allocator< Output > > *', no destructor found.
swig/python detected a memory leak of type 'std::vector< std::vector< Output,std::allocator< Output > > > *', no destructor found.
swig/python detected a memory leak of type 'Alphabet *', no destructor found.

how can fix it?

3 - I have about 500 hours data.
code get so low GPU memory , I have a GeForce GTX 1080 Ti GPU and it has 11 G memory, but code get only 145MiB memory.

lissyx · June 11, 2020, 7:08am

you have not answered on the rest.

this is harmless

read the doc, adjust batch size

reza_ali · June 11, 2020, 7:32am

mmm, for this GPU and amount data, what you suggest about batch size?
I set batch size 8 but it still doesn’t use GPU memory

reuben · June 11, 2020, 7:43am

It’s probably not using the GPU at all, and the 145MiB is other stuff in your system. By default TensorFlow will allocate all of the available GPU memory when training start, regardless of batch size or actual utilization. You need to fix your TensorFlow GPU setup.

lissyx · June 11, 2020, 7:54am

Also, this will use one GPU only.