Hello,
I am currently trying to inference a large number of files using a trained model. During inference, deepspeech python client uses only single gpu out of 3. How can I extend it to use all of them in parallel?
We don’t have support for batch inference in the library currently, your best solution is evaluate.py
/ transcribe.py
The inference process is bottlenecked by the decoder which is CPU only. Using all GPUs won’t gain you much performance, which is why evaluate.py
and transcribe.py
only use a single GPU.
Is there any way of getting the metadata with evaluate.py
or transcribe.py
?
There’s always ways, but this codes lives in libdeepspeech
and so it’s quite some work to do.
The full metadata is already returned by the bindings, it’s just processed in native_client/ctcdecode/__init__.py
to return just (confidence, transcript)
tuples. You should be able to edit that file to get that info exposed to Python.
1 Like