Training hardware bottlenecks


I’m doing training experiments on a couple of machines. Both machines have a single RTX2080Ti in them but different RAM and CPU setups. The thing I’m seeing is that I’m very CPU limited. One setup have a 4 thread i5 and the other have a 8 thread i7. Both of the CPUs seems to struggle to keep the RTX card occupied. The RTXs are never 100% loaded and just idles sometimes.

My question is: how much would I need to bump the CPU in order to max out a single RTX2080Ti? Does the training like many cores or do I need high clock speeds?

Best regards

Do you use phoneme based model ?

I’m currently running experiments with and without phonemes. It’s a lot faster without but the CPU is still maxed out without taxing the 2080Ti more that 30%.

It is normal if it is only the first epoch since it is caching phonemes for the coming epochs. If after the first epoch it is still slow, then there might be something wrong

The phonemes experiment is well past the first epoch and I noticed an increase in GPU usage after the first epoch. But now the i7 won’t load the 2080Ti more than 30%. Would you have any suggestion on how to debug this?

I already print loader time and step time. You can check these values if there is an exceptional delay. Otherwise, that might be about your batch_size. You can try to change it and see the result on GPU utilization. Using too much loader processes might also slow down the process.