Check training on GPU

Hi
How can check that training using GPU?

nvidia-smi, check the processes running, one should be taking a lot of memory.

The process not set not GPU?
How can set it on GPU?
There isn’t any script that show training loss function chart … ?

If you have the right software and hardware installed, it will automatically use the gpu.

have you read the doc and properly setup tensorflow-gpu ? since r1.15, it will default to CPU if it fails to find CUDA …

1 Like

Please try and run with TF_CPP_MIN_VLOG_LEVEL=1 env variable to more easily investigate on tensorflow side …

thanks
1- I check it my tensorflow version was cpu I install gpu version. that’s OK.

2- I get this error when running code:
./bin/run-ldc93s1.sh
+ [ ! -f DeepSpeech.py ]
+ [ ! -f data/ldc93s1/ldc93s1.csv ]
+ [ -d ]
+ python -c from xdg import BaseDirectory as xdg; print(xdg.save_data_path(“deepspeech/ldc93s1”))
+ checkpoint_dir=/home/sokhan/.local/share/deepspeech/ldc93s1
+ export CUDA_VISIBLE_DEVICES=0
+ python -u DeepSpeech.py --noshow_progressbar --train_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1
–test_batch_size 1 --n_hidden 100 --epochs 200 --checkpoint_dir /home/sokhan/.local/share/deepspeech/ldc93s1
*swig/python detected a memory leak of type 'Alphabet ', no destructor found.

and at the end of log it print this lines:

I Test epoch...
swig/python detected a memory leak of type 'Output *', **no destructor found**.
swig/python detected a memory leak of type 'std::vector< Output,std::allocator< Output > > *', no destructor found.
swig/python detected a memory leak of type 'std::vector< std::vector< Output,std::allocator< Output > > > *', no destructor found.
swig/python detected a memory leak of type 'Alphabet *', no destructor found.

how can fix it?

3 - I have about 500 hours data.
code get so low GPU memory , I have a GeForce GTX 1080 Ti GPU and it has 11 G memory, but code get only 145MiB memory.

you have not answered on the rest.

this is harmless

read the doc, adjust batch size

mmm, for this GPU and amount data, what you suggest about batch size?
I set batch size 8 but it still doesn’t use GPU memory

It’s probably not using the GPU at all, and the 145MiB is other stuff in your system. By default TensorFlow will allocate all of the available GPU memory when training start, regardless of batch size or actual utilization. You need to fix your TensorFlow GPU setup.

Also, this will use one GPU only.