Hi,
I have been attempting multi-processing inference with a multi-GPU setup with some issues and was hoping to get some advice on how to solve this issue. First let me clarify that I am working with Deepspeech 0.6.1 with Python 3.6.9.
Following the advice on this thread: Running multiple inferences in parallel on a GPU
I managed to get inference in parallel on the GPU without tensorflow taking up the whole GPU memory with each inference attempt.
However, when I follow the same instructions and rebuild deepspeech on a multi-GPU set up, the allow_growth seems to no longer work and I find that all the memory gets taken up and tensorflow flags out of memory errors.
Below is an image of what watch nvidia-smi shows - this is with running only 1 process (so no paralellising has happened yet but it’s running as a separate process from the main):
My code is as follows:
import torch
import torch.multiprocessing as tmp
from transcription_gpu import run_transcription
import argparse
import time
PROCESS_NUM = 2
if __name__ == '__main__':
before = time.time()
# Setup a number of processes
processes = [tmp.Process(target=run_transcription, args=()) for x in range(1, PROCESS_NUM+1)]
# Run processes
for p in processes:
print("I am a process!")
p.start()
# Exit the completed processes
for p in processes:
p.join()
after = time.time()
print("Processing Time: ", after-before)
import scipy.io.wavfile as wav
import sys
import os
import time
from deepspeech import Model
def run_transcription():
BEAM_WIDTH = 500
LM_WEIGHT = 1.50
VALID_WORD_COUNT_WEIGHT = 2.10
N_FEATURES = 26
N_CONTEXT = 9
MODEL_ROOT_DIR = 'models/deepspeech-0.6.1-models/'
ds = Model(
MODEL_ROOT_DIR + 'output_graph.pb',
BEAM_WIDTH)
ds.enableDecoderWithLM(
MODEL_ROOT_DIR + 'lm.binary',
MODEL_ROOT_DIR + 'trie',
LM_WEIGHT,
VALID_WORD_COUNT_WEIGHT)
before = time.time()
fs, audio = wav.read("audio/test.wav")
transcript = ds.stt(audio)
print(transcript)
after = time.time()
print("Transcription Time: ", after-before)