Training runs on GPU but test runs on CPU and takes a long time

I just want to make sure that the following training/validation/test times make sense or something maybe wrong.
I use the following command:

python3 DeepSpeech.py --train_batch_size 40 --dev_batch_size 40 --test_batch_size 40 --epochs 1 --n_hidden 2048 --learning_rate 0.0001 --alphabet_config_path alphabet.txt --train_files /dataset/fa/clips/train.csv --dev_files /dataset/fa/clips/dev.csv --test_files /dataset/fa/clips/test.csv --export_dir ./export/ --checkpoint_dir ./checkpoints/

I am just working on toy dataset, with nearly the same train/test/val sets, but I see the following times for train, val, and test. Which test time is very long! I also check GPU, almost always it is 0%, but CPU is 2600%.

Epoch 0 |   Training | Elapsed Time: 0:01:20 | Steps: 109 | Loss: 86.756596                                                                                                         
Epoch 0 | Validation | Elapsed Time: 0:00:37 | Steps: 83 | Loss: 102.668332
Test epoch | Steps: 80 | Elapsed Time: 0:26:29

I have three GeForce RTX 2060 SUPER but set CUDA_VISIBLE_DEVICES=2 to only use one GPU.

Decoding is CPU bound, this is expected behavior.