I have been attempting multi-processing inference with a multi-GPU setup with some issues and was hoping to get some advice on how to solve this issue. First let me clarify that I am working with Deepspeech 0.6.1 with Python 3.6.9.
I managed to get inference in parallel on the GPU without tensorflow taking up the whole GPU memory with each inference attempt.
However, when I follow the same instructions and rebuild deepspeech on a multi-GPU set up, the allow_growth seems to no longer work and I find that all the memory gets taken up and tensorflow flags out of memory errors.
Below is an image of what watch nvidia-smi shows - this is with running only 1 process (so no paralellising has happened yet but it’s running as a separate process from the main):
import torch
import torch.multiprocessing as tmp
from transcription_gpu import run_transcription
import argparse
import time
PROCESS_NUM = 2
if __name__ == '__main__':
before = time.time()
# Setup a number of processes
processes = [tmp.Process(target=run_transcription, args=()) for x in range(1, PROCESS_NUM+1)]
# Run processes
for p in processes:
print("I am a process!")
p.start()
# Exit the completed processes
for p in processes:
p.join()
after = time.time()
print("Processing Time: ", after-before)
import scipy.io.wavfile as wav
import sys
import os
import time
from deepspeech import Model
def run_transcription():
BEAM_WIDTH = 500
LM_WEIGHT = 1.50
VALID_WORD_COUNT_WEIGHT = 2.10
N_FEATURES = 26
N_CONTEXT = 9
MODEL_ROOT_DIR = 'models/deepspeech-0.6.1-models/'
ds = Model(
MODEL_ROOT_DIR + 'output_graph.pb',
BEAM_WIDTH)
ds.enableDecoderWithLM(
MODEL_ROOT_DIR + 'lm.binary',
MODEL_ROOT_DIR + 'trie',
LM_WEIGHT,
VALID_WORD_COUNT_WEIGHT)
before = time.time()
fs, audio = wav.read("audio/test.wav")
transcript = ds.stt(audio)
print(transcript)
after = time.time()
print("Transcription Time: ", after-before)
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
It’s unclear: are we talking about inference? training? What did you rebuild? How do you set allow_growth?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
3
We already have similar code in evaluate_tflite.py
It’s unclear: are we talking about inference? training? What did you rebuild? How do you set allow_growth ?
I stated at the beginning of my post that I mean inference. I rebuilt Deepspeech 0.6.1 by altering the tfmodelstate.cc as indicted in the post that I linked with such a solution.
We already have similar code in evaluate_tflite.py
Does this work for multi-gpu setups?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
I don’t see anything multi-GPU specific in your code.
That look okay but if you say it’s not working, maybe there are details to investigate.
There’s a TF_FORCE_GPU_ALLOW_GROWTH environment variable, maybe it would work?
Sharing GPUs between processes is kind of complicated with tensorflow …
This is my point, I am unclear how to achieve this with deepspeech. Whether I need to edit deepspeech code itself or change the way I’m using the package in my python code. It is the point of my post here.
Where do I call this force variable? The session gets generated from the imported methods from Deepspeech, so I don’t know if I have access to it to apply that variable.
Would it be better to create a pool that randomly allocates to different GPUs? I guess I’m wondering why the set_allow_growth flag works fine on my single-GPU setup but fails on my multi-GPU set up even with only one process going.
Easiest way is probably to spawn several independent processes, each one with CUDA_VISIBLE_DEVICES environment variable limiting it to see a single GPU.