Distributed Training with Horovod

y_c · July 22, 2020, 2:24am

Hello,

I am working on training/fine-tuning Deepspeech 0.7.0 (CUDA 10.0, cudnn 7.6.5, tensorflow 1.15.2, ubuntu 16.04 with Python 3.6.7) on RTX 2080 Ti 11GB 2 GPU machine. I have 2 such machines available. I was interested in knowing if anyone here has attempted distributed training with horovod. If yes, can you please share your experiences and what changes you had to make.

thanks

lissyx · July 22, 2020, 7:55am

This is not something we have experience with, but in the past someone started a PR (but never completed it) with that kind of support. Let’s hope this person reads Discourse

utunga · July 22, 2020, 9:34am

Sounds like fun! Any chance you can put both cards into one machine?

reuben · July 22, 2020, 10:32am

The PR is this one: Changes including Horovod integration, removed coordinator code and u… by karkadad · Pull Request #1501 · mozilla/DeepSpeech · GitHub

It’s very old, so our master code is going to be very different, but maybe it can serve as an implementation guide or at the very least as inspiration

y_c · July 22, 2020, 4:07pm

thanks @lissyx and @reuben. Will see if I can use this as a base to pull in the changes

y_c · July 22, 2020, 4:07pm

@utunga: I have 2 RTX 2080Ti Cards on each machine. The reason for not putting them on the same machine is heat from these GPUs (have the GIGABYTE X399 AORUS PRO motherboard with AMD Ryzen Threadripper 1920X with CPU Cooler on each of the computers with 4 GPU slots) . I don’t have external cooling units for the GPUs themselves, so installing all 4 one motherboard is likely to produce intense heat as they are very close to each other. Not sure if any one has tried this. Suggestions welcome.

Thanks

utunga · July 22, 2020, 10:59pm

Understood. Hey perhaps a bit off the point but would love to hear long it takes to do a full DeepSpeech train on your audio data with those two puppies going…

Obviously depends on how much audio you’re training with and your early stopping parameters etc but I’m guessing it would be only a handful of hours (to be delberately vague).

Partly I’m wondering how productive it would be for you to spend time getting all four cards on the same train as opposed to doing some hyper parameter optimization etc.

I’m also curious because we’re thinking of getting new hardware for our training… and wondering how your experiences with the 2080 are.

Thanks!

lissyx · July 23, 2020, 6:39am

I have a similar setup of 2x RTX 2080 Ti, and for ~1000h of french, disabling automatic mixed precision, it’s around 18h.

y_c · July 23, 2020, 4:40pm

@lissyx if I may ask for how many epochs did you train

y_c · July 23, 2020, 4:45pm

The hardware setup itself has not been the smoothest as I initially got RTX 2080Ti NVIDIA founder’s edition and the 4 GPU together on the same machine did not do well on the same motherboard.
I am planning on working with automatic mixed precision, so hoping that will speed up things a bit.

lissyx · July 23, 2020, 9:03pm

I don’t remember but I think it was over 30 or 50 epochs, anyway, that gives you a ballpark

Topic		Replies	Views
Is distributed training across multiple machines still supported? DeepSpeech	17	878	March 16, 2021
Distributed Training on a single machine with two GPUs DeepSpeech	11	2095	January 25, 2019
Distributed training set up of DeepSpeech code DeepSpeech	8	1806	July 20, 2018
Trying to train DeepSpeech with multi GPU using horovod causing RAM(out of memory) error DeepSpeech issue	0	314	August 22, 2021
Multi-Machine Distributed Training issue DeepSpeech	3	796	April 1, 2019

Distributed Training with Horovod

Related topics