Multiprocessing on CPU does not scale

Because of the limitations I have to run inference on CPU. To get working on multiple threads I am running the code “evaluate_tflite.py” with multiprocessing and Process.

I have tested the performance on two CPUs, and here are the results:


Intel i7-7700 - 8 processes

763 Sek.

AMD Ryzen 2700x - 16 processes

791 Sek.


Intel i7-7700 - 1 process

1586 Sek.

AMD Ryzen 2700x - 1 process

1273 Sek.


Clearly, it does not scale, I get ~1.6 performance boost when running on 16 processes vs 1 process on AMD and ~2.1 performance boost with 8 processes vs 1 process on Intel. What may be the issue here? As far as I can see all the CPU threads are 100% loaded, so I am not sure if there is any I/O or memory bottleneck. Removing the LM for inference did not help here (it even extended the inference time by ~20%!).

Obviously the use of GPU would solve the problem, but CPU is all I’ve got for inference.

So, again, we don’t have yet a good setup for that use-case of batched inference. That being said, your results do not matches mine, where the scaling was much more linear.

it would be nice to know better about your environment and your testing process, there are several ways to use evaluate_tflite, so it’s basically impossible to analyze anything here.