Can anyone share any information on the past effort or future directions regarding quantization – does anyone plan to pursue it?
Thanks!
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Thanks for bringing that question. I have been exploring means of quantization relying on TensorFlow’s tooling for our model, and while quantize_weights was an easy one to get working with good results (both in term of memory and disk usage, and in WER impact) quantize_nodes has been another story. We had to fight issues within tensorflow’s tooling related to the bidirectionnal recurrent layers we use, that was breaking with those tools. One fixed, there was an unexpected (but documented risk of) slowdown instead of speedup, in the matter of 10 times.
So far, work on that kind of quantization has been put on hold and I concentrated on leveraging tfcompile usage, which proved to be much more efficient, and easy to do. We still plan on working on that
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
That being said, I’m curious of your results. All the exprience we could gather trying to leverage TensorFlow’s tools would somehow fail around the bidirectionnal RNN layer, or at best give poor results :).
@reuben is working on a streamable model, this requires killing the bidirectionnal recurrent part of the model, and we expect this will make it easier for that usecase, so maybe if you run into issues, it’s worth waiting for that to land? I don’t have any ETA yet though.