How to restrict Transcribe.py from consuming whole GPU memory

lazyguy · February 16, 2021, 11:56am

Hi All,

I have lots of audio files to transcribe and I am using Transcribe.py for transcription. It works fine but I want to speed up the process by multiprocessing. I have successfully implemented multiprocessing on CPU and it has cut down my transcription time by 54% but when I switch over to GPU, whole GPU memory is blocked by each audio transcription. I want to avoid that and I want to set a limit in terms of GPU memory each job can take. Is there a way to do that? Has anyone faced a similar problem where you have used Threading/Multiprocessing/Async to parallelly process more than 1 audios using transcribe.py?

Thanks!

lissyx · February 16, 2021, 1:30pm

Multiprocessing with GPUs is complicated with TensorFlow, this is not something we have experience about. Maybe try to set allow growth to limit the usage and avoid tensorflow to lock all the memory?

But without more details on your setup (we have guidelines for posting for a reason), it’s hard to help.

dkreutz · February 16, 2021, 2:37pm

I did not understand your usecase perfectly, but maybe you can use DeepSpeech-Server or deepspeech-websocket-server. Model will be loaded once and then you can perform inference on multiple audio files.

othiele · February 16, 2021, 3:08pm

Thanks Dominik, good point. I think lazyguy has loads of smaller audios and therefore loading takes up more time than processing. Right?

And how do you multiprocess for CPUs?

Changing the current solution will be hard. Have you thought about merging audios and splitting results afterwards? Not the best solution, but better than wasting GPU-Power.

lazyguy · February 16, 2021, 4:21pm

Hi Lissy, your suggestion worked like a charm but the “allow_growth” flag in config does not work for some reason. However, I looked into the config.py in training/deepspeech_training/utils and set the “per_process_gpu_memory_fraction=.1”, this limits tensorflow to use only 10% of the GPU.

Pasting the exact code snippet(from config.py) that worked for me:

c.session_config = tfv1.ConfigProto(allow_soft_placement=True,
log_device_placement=FLAGS.log_placement,
inter_op_parallelism_threads=FLAGS.inter_op_parallelism_threads,
intra_op_parallelism_threads=FLAGS.intra_op_parallelism_threads, gpu_options=tfv1.GPUOptions(per_process_gpu_memory_fraction=0.1))

I think in the next release, you should add a flag for the user to set how much GPU they want DS to consume.

PS: I am using DS==0.7.1

Hope it helps more users who are looking for something similar!

lazyguy · February 16, 2021, 4:21pm

Thanks alot! I’ll look into it as well!

lazyguy · February 16, 2021, 4:26pm

Hello, yes I have lot of audios some lengthy and some not too lengthy. You are right, loading up again and again is costing time.

Well, for CPU(and GPU as well) what I am doing is executing transcribe.py via subprocess and then multiprocessing that call. CPU does not have an OOM issue like the GPU so it was working fine but now with with GPU memory restriction, same logic is working on GPU. Transcription got 2x faster, multi processing 3 audios each time.

Topic		Replies	Views
Multi-processsing inference with multi-GPU setup DeepSpeech	8	808	June 10, 2020
Splitting the GPU memory to run multiple workers on a deepspeech server DeepSpeech	9	2756	July 9, 2018
Running multiple inferences in parallel on a GPU DeepSpeech	28	7941	January 3, 2020
Cuda_error_out_of_memory with deepspeech-gpu DeepSpeech	9	1152	March 21, 2019
Set Memory Limit in Deepspeech-gpu while prediction DeepSpeech	11	765	January 8, 2021

How to restrict Transcribe.py from consuming whole GPU memory

Related topics