Distributed Training with Horovod

Hello,

I am working on training/fine-tuning Deepspeech 0.7.0 (CUDA 10.0, cudnn 7.6.5, tensorflow 1.15.2, ubuntu 16.04 with Python 3.6.7) on RTX 2080 Ti 11GB 2 GPU machine. I have 2 such machines available. I was interested in knowing if anyone here has attempted distributed training with horovod. If yes, can you please share your experiences and what changes you had to make.

thanks

This is not something we have experience with, but in the past someone started a PR (but never completed it) with that kind of support. Let’s hope this person reads Discourse :slight_smile:

1 Like

Sounds like fun! Any chance you can put both cards into one machine?

The PR is this one: https://github.com/mozilla/DeepSpeech/pull/1501

It’s very old, so our master code is going to be very different, but maybe it can serve as an implementation guide or at the very least as inspiration :slight_smile:

thanks @lissyx and @reuben. Will see if I can use this as a base to pull in the changes

@utunga: I have 2 RTX 2080Ti Cards on each machine. The reason for not putting them on the same machine is heat from these GPUs (have the GIGABYTE X399 AORUS PRO motherboard with AMD Ryzen Threadripper 1920X with CPU Cooler on each of the computers with 4 GPU slots) . I don’t have external cooling units for the GPUs themselves, so installing all 4 one motherboard is likely to produce intense heat as they are very close to each other. Not sure if any one has tried this. Suggestions welcome.

Thanks

1 Like

Understood. Hey perhaps a bit off the point but would love to hear long it takes to do a full DeepSpeech train on your audio data with those two puppies going…

Obviously depends on how much audio you’re training with and your early stopping parameters etc but I’m guessing it would be only a handful of hours (to be delberately vague).

Partly I’m wondering how productive it would be for you to spend time getting all four cards on the same train as opposed to doing some hyper parameter optimization etc.

I’m also curious because we’re thinking of getting new hardware for our training… and wondering how your experiences with the 2080 are.

Thanks!

I have a similar setup of 2x RTX 2080 Ti, and for ~1000h of french, disabling automatic mixed precision, it’s around 18h.

@lissyx if I may ask for how many epochs did you train

The hardware setup itself has not been the smoothest as I initially got RTX 2080Ti NVIDIA founder’s edition and the 4 GPU together on the same machine did not do well on the same motherboard.
I am planning on working with automatic mixed precision, so hoping that will speed up things a bit.

1 Like

I don’t remember but I think it was over 30 or 50 epochs, anyway, that gives you a ballpark