I’m trying to train different versions of DeepSpeech (using different random seeds) for my research project. Assuming I need to rent computing instances, I’m assessing which of these two paths I should go: (1) Rent multiple instances each with one gpu and train one model on each gpu, or (2) Rent an instance with multiple GPUs and train one model at a time using all GPUs. I.e., I’m wondering if I use 4 GPUs, how shorter the training will be? Ideal case would be 0.25x.
Thanks for your time.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Yes, there is more or less a linear speedup in term of number of GPUs, assuming you are able to feed them appropriately.