Those are very reasonable suggesstions. I’m planning:
Test streaming
Test (hacky way) TF using oneDNN for inference , just to see if there is improvement
If there is improvement then perhaps makde DS with TF and oneDNN . Maybe I could share a patch (changes to BUILD) with you if results are good.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
22
That would be welcome, but adding that kind of complexity will have to be balanced with maintainability, as well as impact for deployment, so that’s why you need to go through (2).
Again, building tensorflow with AVX2 enabled for example can give nice speedup, but:
it’s not constant for all intel CPUs, we saw important variations depending on each CPU
it generates only avx2-enabled code, and thus will completely fail to run on a CPU without it ; we had avx2 enabled at first, but too many people were blocked for the lack of it on their (powerful enough) CPU, so we had to go back and stick to only AVX.
You can build tensorflow for AVX but still benefit from AVX2 in oneDNN . This is because JIT code is generated during running of inference and only when CPU where it runs is capable of having AVX2 (or other newer ISAa). So if only TF is build for AVX but oneDNN enabled then still on relevant CPU’s there will be AVX2 inside of oneDNN implementations.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
24