Running multiple inferences in parallel on a GPU

Probably use some sort of torch queue and consumers then from your flask application append the file name to the queue and have one of the consumers pick up from it and process the file. I don’t have experience with this but that’s how I’d do it