How to restrict Transcribe.py from consuming whole GPU memory

Hi All,

I have lots of audio files to transcribe and I am using Transcribe.py for transcription. It works fine but I want to speed up the process by multiprocessing. I have successfully implemented multiprocessing on CPU and it has cut down my transcription time by 54% but when I switch over to GPU, whole GPU memory is blocked by each audio transcription. I want to avoid that and I want to set a limit in terms of GPU memory each job can take. Is there a way to do that? Has anyone faced a similar problem where you have used Threading/Multiprocessing/Async to parallelly process more than 1 audios using transcribe.py?

Thanks!

Multiprocessing with GPUs is complicated with TensorFlow, this is not something we have experience about. Maybe try to set allow growth to limit the usage and avoid tensorflow to lock all the memory?

But without more details on your setup (we have guidelines for posting for a reason), it’s hard to help.

I did not understand your usecase perfectly, but maybe you can use DeepSpeech-Server or deepspeech-websocket-server. Model will be loaded once and then you can perform inference on multiple audio files.

Thanks Dominik, good point. I think lazyguy has loads of smaller audios and therefore loading takes up more time than processing. Right?

And how do you multiprocess for CPUs?

Changing the current solution will be hard. Have you thought about merging audios and splitting results afterwards? Not the best solution, but better than wasting GPU-Power.

Hi Lissy, your suggestion worked like a charm :stuck_out_tongue: but the “allow_growth” flag in config does not work for some reason. However, I looked into the config.py in training/deepspeech_training/utils and set the “per_process_gpu_memory_fraction=.1”, this limits tensorflow to use only 10% of the GPU.

Pasting the exact code snippet(from config.py) that worked for me:

c.session_config = tfv1.ConfigProto(allow_soft_placement=True,
log_device_placement=FLAGS.log_placement,
inter_op_parallelism_threads=FLAGS.inter_op_parallelism_threads,
intra_op_parallelism_threads=FLAGS.intra_op_parallelism_threads, gpu_options=tfv1.GPUOptions(per_process_gpu_memory_fraction=0.1))

I think in the next release, you should add a flag for the user to set how much GPU they want DS to consume.

PS: I am using DS==0.7.1

Hope it helps more users who are looking for something similar!

2 Likes

Thanks alot! I’ll look into it as well!

Hello, yes I have lot of audios some lengthy and some not too lengthy. You are right, loading up again and again is costing time.

Well, for CPU(and GPU as well) what I am doing is executing transcribe.py via subprocess and then multiprocessing that call. CPU does not have an OOM issue like the GPU so it was working fine but now with with GPU memory restriction, same logic is working on GPU. Transcription got 2x faster, multi processing 3 audios each time.