Training runs on GPU but test runs on CPU and takes a long time

zara · December 24, 2020, 12:53pm

I just want to make sure that the following training/validation/test times make sense or something maybe wrong.
I use the following command:

python3 DeepSpeech.py --train_batch_size 40 --dev_batch_size 40 --test_batch_size 40 --epochs 1 --n_hidden 2048 --learning_rate 0.0001 --alphabet_config_path alphabet.txt --train_files /dataset/fa/clips/train.csv --dev_files /dataset/fa/clips/dev.csv --test_files /dataset/fa/clips/test.csv --export_dir ./export/ --checkpoint_dir ./checkpoints/

I am just working on toy dataset, with nearly the same train/test/val sets, but I see the following times for train, val, and test. Which test time is very long! I also check GPU, almost always it is 0%, but CPU is 2600%.

Epoch 0 |   Training | Elapsed Time: 0:01:20 | Steps: 109 | Loss: 86.756596                                                                                                         
Epoch 0 | Validation | Elapsed Time: 0:00:37 | Steps: 83 | Loss: 102.668332
Test epoch | Steps: 80 | Elapsed Time: 0:26:29

I have three GeForce RTX 2060 SUPER but set CUDA_VISIBLE_DEVICES=2 to only use one GPU.

reuben · December 24, 2020, 2:13pm

Decoding is CPU bound, this is expected behavior.

Topic		Replies	Views
Training - test set is run on CPU with a single thread DeepSpeech learning	11	1718	December 23, 2020
Testing phase and GPU usage DeepSpeech	3	539	June 17, 2021
About the utilization of CPU and GPU DeepSpeech	1	1428	December 28, 2017
The same spped with cpu and with gpu DeepSpeech	42	2234	May 3, 2020
Optimization of Deepspeech training in multi-GPU environment DeepSpeech learning	4	1047	April 14, 2021

Training runs on GPU but test runs on CPU and takes a long time

Related topics