Thanks for bringing that question. I have been exploring means of quantization relying on TensorFlow’s tooling for our model, and while
quantize_weights was an easy one to get working with good results (both in term of memory and disk usage, and in
quantize_nodes has been another story. We had to fight issues within tensorflow’s tooling related to the bidirectionnal recurrent layers we use, that was breaking with those tools. One fixed, there was an unexpected (but documented risk of) slowdown instead of speedup, in the matter of 10 times.
So far, work on that kind of quantization has been put on hold and I concentrated on leveraging
tfcompile usage, which proved to be much more efficient, and easy to do. We still plan on working on that