Deepspeech inference with multiple gpu

Hello,
I am currently trying to inference a large number of files using a trained model. During inference, deepspeech python client uses only single gpu out of 3. How can I extend it to use all of them in parallel?

We don’t have support for batch inference in the library currently, your best solution is evaluate.py / transcribe.py

The inference process is bottlenecked by the decoder which is CPU only. Using all GPUs won’t gain you much performance, which is why evaluate.py and transcribe.py only use a single GPU.

Is there any way of getting the metadata with evaluate.py or transcribe.py?

There’s always ways, but this codes lives in libdeepspeech and so it’s quite some work to do.

The full metadata is already returned by the bindings, it’s just processed in native_client/ctcdecode/__init__.py to return just (confidence, transcript) tuples. You should be able to edit that file to get that info exposed to Python.

1 Like