the question here is more related to:
- training dataset size
- setup complexity
- hardware costs
Honestly, I train in < 18h on a desktop at home with 2x RTX 2080 Ti for more than 1000h of audio.
Our cluster in Berlin is made of nodes with 8 GPUs in each.
We had few usage, considering the described setup above, and it was hurting maintenance a lot.
I remember someone doing a PR about Horovod, but it was not very well integrated and the PR fell in limbo and we never heard back, so it’s good.
I don’t think your switch will handle. What GPUs do you have? FTR, PCIe monitoring on the RTX2080Ti here shows transfers ~8GB/s.
We looked at network-based interconnect at some point, but we concluded to be efficient you actually need around 40Gb networking class, even 10GbE was not decent enough, and the extra GPU power would not be used really because of data transfers.