I am currently trying to inference a large number of files using a trained model. During inference, deepspeech python client uses only single gpu out of 3. How can I extend it to use all of them in parallel?
We don’t have support for batch inference in the library currently, your best solution is
The inference process is bottlenecked by the decoder which is CPU only. Using all GPUs won’t gain you much performance, which is why
transcribe.py only use a single GPU.