Quantizing DeepSpeech

parvizp · December 6, 2017, 10:13pm

I noticed there has been some activity regarding TF’s 8 bit quantization, primarily by ‘lissyx’: https://github.com/mozilla/DeepSpeech/issues?utf8=✓&q=quantize

It appears the branches were not merged into master. I can see some earlier work resulted in significant increase in WER: https://github.com/mozilla/DeepSpeech/issues/133#issuecomment-263559215

Can anyone share any information on the past effort or future directions regarding quantization – does anyone plan to pursue it?

Thanks!

lissyx · December 8, 2017, 12:01pm

Thanks for bringing that question. I have been exploring means of quantization relying on TensorFlow’s tooling for our model, and while quantize_weights was an easy one to get working with good results (both in term of memory and disk usage, and in WER impact) quantize_nodes has been another story. We had to fight issues within tensorflow’s tooling related to the bidirectionnal recurrent layers we use, that was breaking with those tools. One fixed, there was an unexpected (but documented risk of) slowdown instead of speedup, in the matter of 10 times.

So far, work on that kind of quantization has been put on hold and I concentrated on leveraging tfcompile usage, which proved to be much more efficient, and easy to do. We still plan on working on that

dedoogong · February 28, 2018, 7:59am

I’m trying to convert the model to TensorRT’s UFF model instead of TF’s quantization.
Can you let me know the input/output node names?

Thank you!

lissyx · February 28, 2018, 8:10am

it’s input_node and output_node as you can read it in DeepSpeech.py. And no need to spam multiple thread for the same question …

lissyx · February 28, 2018, 10:08am

That being said, I’m curious of your results. All the exprience we could gather trying to leverage TensorFlow’s tools would somehow fail around the bidirectionnal RNN layer, or at best give poor results :).

@reuben is working on a streamable model, this requires killing the bidirectionnal recurrent part of the model, and we expect this will make it easier for that usecase, so maybe if you run into issues, it’s worth waiting for that to land? I don’t have any ETA yet though.