I have managed to build the android project and was working on some voice samples. I have seen that it has low quality results on the phone (with the use of .tflite) despite the fact that the same voice .wav files give “good” results in Colab with the loaded wheel file. In colab .pb or .pbmm files were used.
So my question is if there is an implementation where we can use directly the .pb file in the android phone without worrying of the transcription time??
Thanks in advance
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
No. That’s not possible.
Our tests revealed a slight but not that bad degradation with quantized TFLite, around +2% (from 8.22% to 10.1%).
Do you have more insights on the wav file you use ?
I have checked the files with these online tools metadata spectrum
and they seem OK
The strange thing is that with the .pb files (in colab) the results are extremely good with the generated .wav files from my phone. Also with the .tflite file and the wheel file for .tflite in colab I get weird results.
I want to mention also that the .tflite file from version 0.6.0 is extremely bad and I worked with the file from previous version 0.6.0.alpha15!!!
I got these files from this page: releases
I have build also project android_speech
and I saw that for online transcription the results are very good. I have seen that there is an .ogg/Opus encoding. I want to ask what file do you use for online transcription? .pb or .tflite??
Thank you
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
Please be more descriptive than “extremely good” and “weird”.
Also, how do you run the comparison ? We need much more details.
Can you explain a little bit more what you mean here ? What exactly gives you good results? And bad ones?
Please read the code, online transcription is not done through DeepSpeech.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
I’m not sure what you exactly mean. There are a lot of variables, from a clean sound, to no noise, no transformation artifacts, your accent and your way of speaking, etc.
I can give you the transcriptions of my phone generated .wav file from colab and phone with .pb and .tflite files to clarify what is ‘weird’ and what is ‘very good’ if you want…
But first my main question…why transcription with .pb file is not possible?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
7
Because:
it would not allow to run realtime
it is not supported by tensorflow, only tflite runtime works on Android, as much as I can tell
No sorry! This is not accurate: ‘it is not supported by tensorflow, only tflite runtime works on Android’
Check this link to see my project with Tensorflow’s Deeplabv3+ inside android where .pb file is used. Also my github account is full of android projects with .pb files. So this is possible.
‘it would not allow to run realtime’…If a whole picture can be converted and proccessed in less that 1 second inside nowadays phone then I believe a spectogram will be easy also.
So do you think I have to alter some Cpp files to load the .pb file inside android?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
As much as I can tell, this was not working properly, being overly too slow, much more painful and fragile to cross-compile. TFLite runtime was much efficient.
Are we comparing the same complexity of models ?
You would need to rebuild quite a lot of things, and in unsupported ways … Android builds defined runtime=tflite: DeepSpeech/native_client/BUILD at master · mozilla/DeepSpeech · GitHub
So if follow the docs and don’t pass --define=runtime=tflite then it should try to build TensorFlow runtime instead. But I won’t have time to help you getting that built … It’d be much more productive that we identify what gets bad in your case. Because that does not reflect our experience …
Now, can you please answer all my previous questions ? I still don’t get what you meant with your references to 0.6.0-alpha.15 ? Was it the model ? the lib ? both ?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
12
So, those are not really intended for general use, I’ve put them online so that people hacking on androidspeech / mozillaspeechlibrary and a few other project can use them.
That’s just one example … I’m not sure it is really enough to draw any conclusion.
So please why you told me DeepSpeech is not used at online transcriptions? From code I see that Bytearrayoutputstream with some tags are passed to this endpoint “https://speaktome-2.services.mozilla.com/” and the response contais the transcription.
So what do you use for online transcription? Some other model?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
14
Because that’s the truth ?
Speak to me infrastructure has several implementations, the default one being not DeepSpeech. i’m not in charge of the whole infra, so I can’t give more details, but I think it’s Google STT by default.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
16
I’m not the one who did the release, so I don’t know exactly how the export was done. Maybe @reuben could give more details, but v0.6.0-alpha.15 was just a re-export of v0.5.1 checkpoint, and we did changes to the graph that should have made v0.6.0 much better than v0.6.0-alpha.15.
Though, @reuben is on PTO until next year, so don’t expect news soonish.
Please note that we are looking into switching to TFLite for all runtimes, so getting it to work well is important.
1 Like
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
17
There’s definitively something going on here …
$ ~/tmp/deepspeech/0.6.0/tfpb/deepspeech --model ~/tmp/deepspeech/0.6.0/eng/deepspeech-0.6.0-models/output_graph.pbmm --lm limited_lm.binary --trie limited_lm.trie --audio deepspeech_dump_all.wav -t
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.0-0-g6d43e21
2019-12-19 16:08:41.433573: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
turn the bedroom lamp light on
cpu_time_overall=7.25584
$ ~/tmp/deepspeech/0.6.0/tflite/deepspeech --model ~/tmp/deepspeech/0.6.0/eng/deepspeech-0.6.0-models/output_graph.tflite --lm limited_lm.binary --trie limited_lm.trie --audio deepspeech_dump_all.wav -t
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.0-0-g6d43e21
INFO: Initialized TensorFlow Lite runtime.
on the bedroom light on
cpu_time_overall=12.07772
And without LM:
$ ~/tmp/deepspeech/0.6.0/tflite/deepspeech --model ~/tmp/deepspeech/0.6.0/eng/deepspeech-0.6.0-models/output_graph.tflite --audio deepspeech_dump_all.wav -t
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.0-0-g6d43e21
INFO: Initialized TensorFlow Lite runtime.
no the bederorond lighte pon
cpu_time_overall=12.68055
$ ~/tmp/deepspeech/0.6.0/tfpb/deepspeech --model ~/tmp/deepspeech/0.6.0/eng/deepspeech-0.6.0-models/output_graph.pbmm --audio deepspeech_dump_all.wav -t
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.0-0-g6d43e21
2019-12-19 16:11:19.134152: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
ter de bedroom along light on
cpu_time_overall=8.49712