Hi,
Is there anyone who managed to run Deepspeech Model training on NVIDIA A100 GPUs?
We got a server with 8 NVIDIA A100 40GB CoWoS HBM2 PCIe 4.0-- Passive Cooling, but couldn’t manage to run training with GPU yet.
Hi,
Is there anyone who managed to run Deepspeech Model training on NVIDIA A100 GPUs?
We got a server with 8 NVIDIA A100 40GB CoWoS HBM2 PCIe 4.0-- Passive Cooling, but couldn’t manage to run training with GPU yet.
Hi Betim,
I tried training it on A100 sometime back but A100 does not support CUDA 10.0, I guess(I could be wrong, it’s been a while last I checked :p). If you can get CUDA 10.0, then it should work.
Maybe NVIDIA provides tensorflow r1.15 for those GPUs ?
So we have to uninstall DS tf and install NVIDIA r1.15?
There is no such thing as DeepSpeech TensorFlow. We rely on upstream TensorFlow. I read somewhere that NVIDIA is providing a TensorFlow r1.15 package for RTX 3xxx, so maybe this could apply to your case as well.
I see that description match what we’re talking on this gh link: https://github.com/NVIDIA/tensorflow.
@lissyx can you just check it to give me a hint if we’re pointing to the same direction.
I’m sorry but if you don’t ask me a clear question, I don’t have time to dig into nvidia’s repos, and I can’t speak for them nor provide support for their work.
@betim And the start of their readme is pretty clear to me, it seems to be exactly this usecase that is addressed.
@betim I confirm I can train a model using 3000 series with Nvidia’s version, I guess it also works with the a100 since they are ampere based gpus.
NVIDIA does also offer docker container with their TensorFlow build
https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow
Maybe, you want to give it a try instead of building TensorFlow on you own.
I also can not test A100 GPUs at the moment.