How to speed up the work?
A two-second audio file recognizes 5 seconds.
I need the application to recognize it online. It won’t be scary if the application skips for half a second every 5 seconds.
Made a new application.
Added line “implementation ‘org.mozilla.deepspeech: libdeepspeech: 0.6.0-alpha.4’” in “dependencies {” to the file “app / build.gradle”.
Partially migrated code from “native_client / java”.
Launched on the phone.
Uploaded model files from https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.1.
Added them to “/ sdcard / deepspeech /” through “Device File Explorer”.
Do not forget to add the line “<uses-android: name =” android.permission.READ_EXTERNAL_STORAGE “/>” in AndroidManifest.xml. And gave permission to the application on the phone.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
We can’t help you without more context. I can tell you we are 2x faster than real time on Android devices. That all depends on the devices …
Again, without more context on your hardware, I can’t tell if it’s normal or not.
For v0.6.0a4 you need to re-export the 0.5.1 checkpoints.
How to speed up the work?
A two-second audio file recognizes 5 seconds.
We can’t help you without more context. I can tell you we are 2x faster than real time on Android devices. That all depends on the devices …
I need the application to recognize it online. It won’t be scary if the application skips for half a second every 5 seconds.
Again, without more context on your hardware, I can’t tell if it’s normal or not.
What you need? Phone model?
Phone model: Huawei ALE-L21.
Here are the characteristics of Huawei P8lite - Full phone specifications (I don’t know how accurate they are).
For v0.6.0a4 you need to re-export the 0.5.1 checkpoints.
And you can detail how to do it (step by step)?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
7
Please read the documentation ?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
8
Ok, given the specs, I’m not surprised about your performances. Sadly, there’s not much that can be done here. It’s a 2015 SoC, 1.2GHz Cortex-A53, and it’s likely it lacks accelerators that TFLite can leverage.
Reducing the model complexity would be the only solution, but that’s a non-trivial task to be able to train the model with similar quality and less nodes.
I understand that you are already tired here.
For me, English is not my native language and I have to use https://translate.google.com.
And the documentation is very short. It would be nice to expand it with additional examples.
And you can detail how to do it (step by step)?
Please read the documentation ?
Can you still write how to do it? How to create a model for v0.6.0a4 (re-export the 0.5.1 checkpoints)?
Ok, given the specs, I’m not surprised about your performances. Sadly, there’s not much that can be done here. It’s a 2015 SoC, 1.2GHz Cortex-A53, and it’s likely it lacks accelerators that TFLite can leverage.
Reducing the model complexity would be the only solution, but that’s a non-trivial task to be able to train the model with similar quality and less nodes.
For example, the built-in recognition in the phone copes with it perfectly. I understand that embedded recognition has advantages over DeepSpeech. But still I want to speed up the recognition.
The quality of recognizable sound will be good, without noise.
How can I speed up recognition? Can I somehow remake a trained model? It does not matter if the quality of the call gets a little worse.
Or do I need to re-train the model?
thank
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
11
English is not my native language either, so I do understand the extra mental load.
We can’t know that if you don’t tell us what you find missing / complicated.
I’m sorry but at some point, when it’s obviously documented in the help, there’s nothing more I can do than to tell you to read it. Unless you tell me what is unclear in the doc, I can’t figure it out for you. There’s --checkpoint_dir, --export_dir, that’s all you need. There are several examples of that all throughout the forum.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
12
Are you aware the builtin recognition runs online, while our system runs offline ? At some point, you need to understand that if the model is too complicated to fit in the computing budget of your CPU, there’s nothing that can be trivially done to improve speed.
That does not really matter
The only solution is to reduce the model complexity, n_hidden=2048 with the current models. The temporal complexity of the model is depending quadratically on this. So roughly, if you say it takes 5 seconds to run inference on a 2 seconds audio file, it means you are mostly 2.5x slower than realtime. From there you should be able to infer the complexity of the model you require.
But re-training from scratch with smaller model dimension is going to be a complicated process: you need thousands of hours of audio, and several tentative to adjust the parameters.
On which computer and how long can such training take? About. If you use Common Voice data for the English language.
Can be specified in dozens of hours.