What is the latest version of CUDA and cuDNN that DeepSpeech will work with?

from what i can tell CUDA 10.1 and CuDNN 7.6 are the most recent versions that will work with DS 0.9.3 - could someone confirm this, please?

also, does DS alpha 0.10 now support CUDA 11 and cuDNN 8.x?

thanks, julian

This is what is documented: https://deepspeech.readthedocs.io/en/v0.9.3/USING.html?highlight=CUDA#cuda-dependency-inference

No.

My (inofficial) builds of DeepSpeech for Jetson/Xavier run with CUDA 10.2 and cuDNN 8.

Jetson/Xavier platform has CUDA computing capabilities 5.3, 6.2, 7.2, don’t know if that is applicable to other platforms/computing capabilities, though.

dominik - many thanks for this. i will give it a try.

i have just installed ubuntu 20 in a new machine (with an RTX 3060 card). i am not sure what OS the Jetson/Xavier platform prefers or that you are using (i realise it is ARM so perhaps completely different), but do you happen to know whether i should downgrade to ubuntu 18 in order to make deepspeech + gpu work (on 64bit AMD)?

i say this because it looks like cuda 10 does not have versions for ubuntu 20 (at least not that i could see). the gpu part is critical as i have colossal volumes (= thousands of hours) of recorded media to transcribe. getting DS to work with CPU is easy, but getting it to work with a GPU is something i have never yet achieved. this all assumes that inferencing can take advantage of GPUs, not just training.

thx, julian

Hi Julian,

Yes, Jetson/Xavier devices are arm64/aarch64 architecture. They run “Linux for Tegra” which is basically Ubuntu 18.04 (customized for the Nvidia Tegra embedded device platform). On my Xavier-AGX I see a performance increase of 10-25% when using GPU instead of CPU for inference.

In case you want to perform batch processing you want to use something like DeepSpeech-server. By this you avoid loading and initialising the model for each inference item.

Use transcribe.py then and follow the training setup guidelines, you will need cuda 10.0 with official tensorflow.

I know that nvidia is providing r1.15 packages that enables using newer CUDA.

thanks for this, dominik. it looks like i have jumped the gun by getting a 30 series card (which was not easy to come by!) because according to the nvidia docs i have now found, only CUDA 11.1 will recognise it. it is probably not possible to say when DeepSpeech will support 11.1.

apologies if this is a stupid question, but do you know what GTX or RTX card your gpu is similar to? i am trying to gauge whether it is even worth the awful hassle of installing CUDA and cuDNN and maybe i should just go for the most powerful cpu that i can afford and forget about the gpu.

thanks for the tip about DS server.

thanks, lissyx. i will look into the transcribe.py script and the nvidia r1.15 packages.

According to this wikipedia article CUDA SDK 10.0 – 10.2 supports compute capability 3.0 – 7.5, so you might get lucky with a RTX 20x0.

Xavier-AGX’s GPU is Volta-architecture, direct comparison with desktop graphic card GPUs is a bit tricky, though. Overall perfomance is somewhere in the GTX 10x0 ballpark (more on the 1060 side than 1080).

That would have been my recommendation, too.

While we have quite fast CPU transcription, if you have high volume you can much better benefit from GPUs capabilities and this will still be several orders of magnitude more efficient.

On a few concurrent streams usecase, it’s hard to make proper use of a GPU, but on many local files it’s easy.