Splitting the GPU memory to run multiple workers on a deepspeech server

LearnedVector · March 14, 2018, 10:56pm

Hello, I am wondering if it’s possible to split the gpu memory up into fractions so that I can run multiple deepspeech instances.

In tensorflow you can configure the session object to only use a fraction of the available memory. Example here. Anyway to configure the same thing in deepspeech?

for reference, I am using the python deepspeech client.

kdavis · March 15, 2018, 8:16am

Currently we don’t have any option to do so.

However, it looks like one could do so with a few code changes using the technique you reference[1] along with, for the case of multiple GPU’s on a single machine, use of CUDA_VISIBLE_DEVICES[2].

LearnedVector · March 15, 2018, 7:47pm

@kdavis thanks for the tip. I was looking through the code, but it seems like I would have to make this change in the DeepSpeech.py but that’s more for training purposes. Is there a way to do that in the python native client? Seems like maybe it would be in deepspeech.h or deepspeech.cc.

kdavis · March 16, 2018, 8:59am

@LearnedVector My guess, though I’ve not tried, is that you’d have to modify the SessionOptions[1] passed to NewSession[2] to include GPU options.

sivakanishka · April 19, 2018, 5:46am

@LearnedVector Were you able to split the GPU memory for running the inference using deepspeech client?

reuben · April 21, 2018, 1:18pm

FWIW, I tested changing that GPU limit in deepspeech.cc and it works fine.

praveeny1986 · April 25, 2018, 12:02pm

Why it does not work [here] ?(https://github.com/mozilla/DeepSpeech/blob/ae146d06199280758cb34acb3496c0ec5d303ad6/DeepSpeech.py#L1758)?
def do_single_file_inference(input_file_path):
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
with tf.Session(config=config) as session:

After modifying with the above 4 lines…while using --one_shot_infer it still uses all GPU memory, why?

reuben · April 27, 2018, 10:36pm

No idea, maybe some TensorFlow bug? Are you also using --notrain --notest?

praveeny1986 · April 30, 2018, 4:36am

yes using --train False --test False but still it is using whole GPU memory…

omid.eefcr · July 9, 2018, 7:24pm

Hi, could you please elaborate on it?
what did you do exactly?

Topic		Replies	Views
How to specify it to run on single/idle gpu only? DeepSpeech	9	1157	July 11, 2018
How to restrict Transcribe.py from consuming whole GPU memory DeepSpeech issue	6	544	February 16, 2021
Multi-processsing inference with multi-GPU setup DeepSpeech	8	801	June 10, 2020
Running multiple inferences in parallel on a GPU DeepSpeech	28	7915	January 3, 2020
Set Memory Limit in Deepspeech-gpu while prediction DeepSpeech	11	762	January 8, 2021

Splitting the GPU memory to run multiple workers on a deepspeech server

Related topics