Running multiple inferences in parallel on a GPU

tbatkin · January 3, 2020, 8:10am

Probably use some sort of torch queue and consumers then from your flask application append the file name to the queue and have one of the consumers pick up from it and process the file. I don’t have experience with this but that’s how I’d do it

Topic		Replies	Views
Multi-processsing inference with multi-GPU setup DeepSpeech	8	792	June 10, 2020
Splitting the GPU memory to run multiple workers on a deepspeech server DeepSpeech	9	2741	July 9, 2018
How to restrict Transcribe.py from consuming whole GPU memory DeepSpeech issue	6	534	February 16, 2021
Loading pre-trained DeepSpeech model on GPU for inference! DeepSpeech	3	775	March 27, 2020
Multiple simultaneous inferences with a single video card DeepSpeech	1	257	August 14, 2020

Running multiple inferences in parallel on a GPU

Related topics