Using deepspeech-rs with GPU

dickdanieljr · February 16, 2020, 10:15pm

I experience very long inference times on my desktop. I.e. 200ms for a ogg file with less than 1s in duration and no content. For a 20s regular ogg file it takes at least 6 seconds.

I’m using the prebuilt model on Rust with the deepspeech-rs binding.

I want to use my GPU. I’m using the native_client.amd64.cuda.linuxmodel already. What else can i do?

lissyx · February 17, 2020, 7:26am

Following the build instructions of the rust crate should be enough, if you use the CUDA-enabled libdeepspeech.so (that you linked) it should work transparently. cc @est31

dickdanieljr · February 17, 2020, 9:30am

Thanks for your response… hmm. Can I somehow check, if my project indeed uses the GPU? I’m not sure, because the inference times remain the same.

lissyx · February 17, 2020, 10:11am

When running on GPU, TensorFlow should output a lot of informations on stdout/stderr. Have you properly setup the system so that the rust crate uses the CUDA version ?

lissyx · February 17, 2020, 10:13am

Content or no content, the system has to analyze it … 200ms for 1s that’s 5x realtime, I don’t think it qualifies for “long inference time”. Can you clarify your expectations ?

Same, 6s of inference for a 20s audio file (ogg is not supported, so you or the crates does convert to WAV at some point) is much faster than realtime.

lissyx · February 17, 2020, 10:20am

FTR: I’m confident it works because I have code doing that …

dickdanieljr · February 17, 2020, 3:30pm

You know it works because you are also using deepspeech-rs?

TensorFlow doesn’t output a lot of information for me. Just that the model is loaded (I don’t know the specific message from the top of my head).

I setup the rust crate as described in their README and then switched the nativeclient to the one specified. Can I do more? deepspeech-rs only wraps the deepspeech API, so the coding should be left untouched.

How is 6s inference for a 20s audio file faster than realtime? Only if you “stream” the inference, like Google Translate does? Is that possible with deepspeech?

I would run this on a server. With those inference times you can answer how many calls per second? 2?

Thanks for your response.

lissyx · February 17, 2020, 4:21pm

Yes

But we don’t have anything really efficient yet.

Real-time means we process audio faster than it “arrives”

lissyx · February 17, 2020, 4:23pm

Again, just use the Cuda libdeepspeech.so, instead of the default one. You can see that in the dockerfile of the project above

lissyx · February 17, 2020, 5:01pm

Basically what you need is batching, and this requires a lot of work, that is initiated but far from being complete. Cc @kdavis

dickdanieljr · February 17, 2020, 11:51pm

Thank you for the input. I didn’t copy that file for some reason. Stupid mistake…

lissyx · February 18, 2020, 1:02pm

So, is it working now ?

dickdanieljr · February 20, 2020, 9:09pm

Yes, with GPU I can reduce inference time from ~6.7s to ~2.8s. For the 20s file. Thanks for help again chief.

I use ryzen 3700x and geforce rtx 2070 super.

Topic		Replies	Views
Inference time on V100 seems slow DeepSpeech	13	3154	March 13, 2018
GPU much slower DeepSpeech	9	1892	February 25, 2018
Jetson TX1 deepspeech inference performance is not real time DeepSpeech	3	868	September 11, 2018
Train and Inference on difference resources DeepSpeech	3	306	April 28, 2021
DeepSpeech benchmarking / Shorten inference time DeepSpeech	16	5728	February 14, 2018

Using deepspeech-rs with GPU

Related topics