Relation of V100 / CPU / Memory for a training VM

I am trying to train a larger set of speech (~ 100h) on a VM. And I am wondering how many CPUs I should take for 1 V100 with how much memory.

Any recommendations or best practices?

Thanks

I’m afraid we don’t have that kind of feedback. What are you hesitanting between? CPU and memory will be important but not that much compared to the GPU itself.

Costs are a factor :slight_smile:

Just ran some tests. For roughly 50 hours of speech and several configurations on a Google VM with max 16 cores.

1xV100 - 1:30 h/Epoch - 3 $/Epoch

2xV100 - 1:05 h/Epoch - 4.5 $/Epoch

4xV100 - 0:47 h/Epoch - 6.05 $/Epoch

I thought that it would cost about as much on 4 V100s, but I get the results faster. Instead, I pay about double … This might be due to the limited 16 CPUs I currently have.

But I would roughly say min. 8 cores and 8 GB per V100 could be a combination to start with.

One last update.

Tried it with 4 V100 and 40 CPUs:

4xV100 - 0:40 h/Epoch - 5.40 $/Epoch

So at least on Google infrastructures it is best to take just 1 V100 and let it run a little longer :slight_smile: