Multiple simultaneous inferences with a single video card

Greetings,

I wonder if anybody has attempted running multiple simultaneous inferences using a single video card.
My test scenario is to run two instances of C++/DeepSpeech based application (with GStreamer pipeline) and a test script with two threads feeding RAM buffered audio (10 seconds of voice) to each instance. The 1st instance gets hold of video card/GPU, consumes all memory and works normally. The 2nd instance cannot allocate any memory on video card and just chokes and dies.
So guess using a single GPU card is not an option if more than one inference is desired? I saw an earlier post Running multiple inferences in parallel on a single machine? and it looks like tying a process to CPU core is only marginally better.
Would love to hear if somebody else had success with something like this.

I know @josh_meyer worked on something like that

it can if you batch

Where is the need for using GPU coming from here?