Training model with NVIDIA A100

Hi,

Is there anyone who managed to run Deepspeech Model training on NVIDIA A100 GPUs?

We got a server with 8 NVIDIA A100 40GB CoWoS HBM2 PCIe 4.0-- Passive Cooling, but couldn’t manage to run training with GPU yet.

Hi Betim,

I tried training it on A100 sometime back but A100 does not support CUDA 10.0, I guess(I could be wrong, it’s been a while last I checked :p). If you can get CUDA 10.0, then it should work.

Maybe NVIDIA provides tensorflow r1.15 for those GPUs ?

1 Like

So we have to uninstall DS tf and install NVIDIA r1.15?

There is no such thing as DeepSpeech TensorFlow. We rely on upstream TensorFlow. I read somewhere that NVIDIA is providing a TensorFlow r1.15 package for RTX 3xxx, so maybe this could apply to your case as well.

I see that description match what we’re talking on this gh link: https://github.com/NVIDIA/tensorflow.

@lissyx can you just check it to give me a hint if we’re pointing to the same direction.

I’m sorry but if you don’t ask me a clear question, I don’t have time to dig into nvidia’s repos, and I can’t speak for them nor provide support for their work.

@betim And the start of their readme is pretty clear to me, it seems to be exactly this usecase that is addressed.

1 Like

@betim I confirm I can train a model using 3000 series with Nvidia’s version, I guess it also works with the a100 since they are ampere based gpus.

NVIDIA does also offer docker container with their TensorFlow build
https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow
Maybe, you want to give it a try instead of building TensorFlow on you own.
I also can not test A100 GPUs at the moment.

1 Like

Hi Carl, can you please share which CUDA & CUDNN version you are using? And also what is your tensorflow version? We are planning to use Nvidia A40, your info would greatly help us!