I am currently trying to inference a large number of files using a trained model. During inference, deepspeech python client uses only single gpu out of 3. How can I extend it to use all of them in parallel?
We don’t have support for batch inference in the library currently, your best solution is
The inference process is bottlenecked by the decoder which is CPU only. Using all GPUs won’t gain you much performance, which is why
transcribe.py only use a single GPU.
Is there any way of getting the metadata with
There’s always ways, but this codes lives in
libdeepspeech and so it’s quite some work to do.
The full metadata is already returned by the bindings, it’s just processed in
native_client/ctcdecode/__init__.py to return just
(confidence, transcript) tuples. You should be able to edit that file to get that info exposed to Python.