Hello, I am wondering if it’s possible to split the gpu memory up into fractions so that I can run multiple deepspeech instances.
In tensorflow you can configure the session object to only use a fraction of the available memory. Example here. Anyway to configure the same thing in deepspeech?
for reference, I am using the python deepspeech client.
However, it looks like one could do so with a few code changes using the technique you reference[1] along with, for the case of multiple GPU’s on a single machine, use of CUDA_VISIBLE_DEVICES[2].
@kdavis thanks for the tip. I was looking through the code, but it seems like I would have to make this change in the DeepSpeech.py but that’s more for training purposes. Is there a way to do that in the python native client? Seems like maybe it would be in deepspeech.h or deepspeech.cc.