Because of the limitations I have to run inference on CPU. To get working on multiple threads I am running the code “evaluate_tflite.py” with multiprocessing and Process.
I have tested the performance on two CPUs, and here are the results:
Intel i7-7700 - 8 processes
763 Sek.
AMD Ryzen 2700x - 16 processes
791 Sek.
Intel i7-7700 - 1 process
1586 Sek.
AMD Ryzen 2700x - 1 process
1273 Sek.
Clearly, it does not scale, I get ~1.6 performance boost when running on 16 processes vs 1 process on AMD and ~2.1 performance boost with 8 processes vs 1 process on Intel. What may be the issue here? As far as I can see all the CPU threads are 100% loaded, so I am not sure if there is any I/O or memory bottleneck. Removing the LM for inference did not help here (it even extended the inference time by ~20%!).
Obviously the use of GPU would solve the problem, but CPU is all I’ve got for inference.