Worse evaluation results with evaluate_tflite than evaluate

Jendker · April 20, 2020, 1:00pm

I wanted to check how fast is evaluate_tflite and it is a couple of orders of magnitude slower than evaluate. But what surprised me the most was much worse quality of inference. With evaluate_tflite I got 40% WER and with evaluate 20% WER. Is it a known issue?

I don’t see any place in the evaluate_tflite script though where I could specify the model, the model variable is not used in tflite_worker.

So I assume, maybe the beam width is much smaller than what I use with evaluate.py? (There I have the default beam_width of 1024).

lissyx · April 20, 2020, 1:07pm

?
here, args.model: DeepSpeech/evaluate_tflite.py at master · mozilla/DeepSpeech · GitHub

please be specific on what you tested, evaluate tflite uses the Python bindings and spawns multiple processes. If you compare to GPU-backed big-batch, it’s not surprising to be slower.

Again, without more details:

i can’t tell if it’s expected in your case
our testing showed no meaningful differences

Jendker · April 20, 2020, 1:12pm

Thank you!

I have German test set of roughly 30 hours which is used both for evaluate.py and evaluate_tflite.py

sorry, my bad. I meant scorer argument.

lissyx · April 20, 2020, 1:13pm

args.scorer is the next one …

And are you sure you are running the exact same comparison?

Jendker · April 20, 2020, 1:15pm

Sure it is. But it is not used in the tflite_worker function.

I’ll double check it, because if your tests indicate similar performance it should be mistake on my side.

lissyx · April 20, 2020, 1:18pm

Sorry, but your message was very unclear. That looks like a bug you could send a fix for, it’s easy to fix: we lack a call to enable the external scorer.

Can you file a bug at least, and make a PR if you can? Since you are working on that you can verify whether it works or not.

If we are missing the scorer, it may very well explain your discrepancy

Jendker · April 20, 2020, 1:19pm

Great, I’ll come back with PR.

Topic		Replies	Views
How to use the pretrained tflite model? DeepSpeech	33	6304	May 6, 2020
Result of tflite and scorer on android_mic_strem is so bad DeepSpeech	4	482	April 26, 2021
Testing fails with evaluate.py DeepSpeech	3	644	February 28, 2020
Have some problem with evaluate.py DeepSpeech	9	873	April 15, 2020
Evaluate.py with pbmm model instead of checkpoint and wav_filename in report DeepSpeech	3	718	October 28, 2019

Worse evaluation results with evaluate_tflite than evaluate

Related topics