Quantizing DeepSpeech

(Parviz Palangpour) #1

I noticed there has been some activity regarding TF’s 8 bit quantization, primarily by ‘lissyx’: https://github.com/mozilla/DeepSpeech/issues?utf8=✓&q=quantize

It appears the branches were not merged into master. I can see some earlier work resulted in significant increase in WER: https://github.com/mozilla/DeepSpeech/issues/133#issuecomment-263559215

Can anyone share any information on the past effort or future directions regarding quantization – does anyone plan to pursue it?


(Lissyx) #2

Thanks for bringing that question. I have been exploring means of quantization relying on TensorFlow’s tooling for our model, and while quantize_weights was an easy one to get working with good results (both in term of memory and disk usage, and in WER impact) quantize_nodes has been another story. We had to fight issues within tensorflow’s tooling related to the bidirectionnal recurrent layers we use, that was breaking with those tools. One fixed, there was an unexpected (but documented risk of) slowdown instead of speedup, in the matter of 10 times.

So far, work on that kind of quantization has been put on hold and I concentrated on leveraging tfcompile usage, which proved to be much more efficient, and easy to do. We still plan on working on that :slight_smile:

(Dedoogong) #3

I’m trying to convert the model to TensorRT’s UFF model instead of TF’s quantization.
Can you let me know the input/output node names?

Thank you!

(Lissyx) #4

it’s input_node and output_node as you can read it in DeepSpeech.py. And no need to spam multiple thread for the same question …

What is the input node and output node?
(Lissyx) #5

That being said, I’m curious of your results. All the exprience we could gather trying to leverage TensorFlow’s tools would somehow fail around the bidirectionnal RNN layer, or at best give poor results :).

@reuben is working on a streamable model, this requires killing the bidirectionnal recurrent part of the model, and we expect this will make it easier for that usecase, so maybe if you run into issues, it’s worth waiting for that to land? I don’t have any ETA yet though.