I want to run Deepspeech (native_client/java) on my mobile phone using Android Studio (Win 7 64)

Hey.
I want to run Deepspeech (native_client / java) on my mobile phone. Run using Android Studio on Windows 7 (64).

I downloaded https://github.com/mozilla/DeepSpeech/tree/v0.6.0- alpha.4 in the folder D: \ Android \ Projects \ Deepspeech
Executed the command “git clone https://github.com/mozilla/DeepSpeech.git ./”
And did “git checkout v0.6.0-alpha.4”

Then I downloaded https://github.com/mozilla/tensorflow/tree/r1.14. Remote branch “r1.14” in the folder D: \ Android \ Projects \ tensorflow.

Further not entirely clear what to do next?
Can you write in detail what next step to do?
What to install? What to do, how and in what? Etc.

I think then this instruction will help many.

The model I want to use already ready (Deepspeech). While I will not do my.

We provide a libdeepspeech JNI bindings that you can use in your application.

We already have docs covering how to rebuild / test as well as gradle stuff to use the bindings.

And there’s code using the bindings in a demo APK in this subdirectory …

Everything is here, just read it, apply, and send PR if there are docs improvements required.

And it’s available on bintray: https://bintray.com/alissy/org.mozilla.deepspeech/libdeepspeech

So as long as you have jcenter repo configured (which seems to be a default), the implementation line documented is all you need …

It worked.

How to speed up the work?
A two-second audio file recognizes 5 seconds.
I need the application to recognize it online. It won’t be scary if the application skips for half a second every 5 seconds.

Are there new models for version “0.6.0-alpha.4” or load old https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.1?

It worked.

Made a new application.
Added line “implementation ‘org.mozilla.deepspeech: libdeepspeech: 0.6.0-alpha.4’” in “dependencies {” to the file “app / build.gradle”.

Partially migrated code from “native_client / java”.
Launched on the phone.
Uploaded model files from https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.1.
Added them to “/ sdcard / deepspeech /” through “Device File Explorer”.

Do not forget to add the line “<uses-android: name =” android.permission.READ_EXTERNAL_STORAGE “/>” in AndroidManifest.xml. And gave permission to the application on the phone.

We can’t help you without more context. I can tell you we are 2x faster than real time on Android devices. That all depends on the devices …

Again, without more context on your hardware, I can’t tell if it’s normal or not.

For v0.6.0a4 you need to re-export the 0.5.1 checkpoints.

How to speed up the work?
A two-second audio file recognizes 5 seconds.

We can’t help you without more context. I can tell you we are 2x faster than real time on Android devices. That all depends on the devices …

I need the application to recognize it online. It won’t be scary if the application skips for half a second every 5 seconds.

Again, without more context on your hardware, I can’t tell if it’s normal or not.

What you need? Phone model?
Phone model: Huawei ALE-L21.
Here are the characteristics of https://www.gsmarena.com/huawei_p8lite-7201.php (I don’t know how accurate they are).

For v0.6.0a4 you need to re-export the 0.5.1 checkpoints.

And you can detail how to do it (step by step)?

Please read the documentation ?

Ok, given the specs, I’m not surprised about your performances. Sadly, there’s not much that can be done here. It’s a 2015 SoC, 1.2GHz Cortex-A53, and it’s likely it lacks accelerators that TFLite can leverage.

Reducing the model complexity would be the only solution, but that’s a non-trivial task to be able to train the model with similar quality and less nodes.

I understand that you are already tired here.
For me, English is not my native language and I have to use https://translate.google.com.
And the documentation is very short. It would be nice to expand it with additional examples.

And you can detail how to do it (step by step)?

Please read the documentation ?

Can you still write how to do it? How to create a model for v0.6.0a4 (re-export the 0.5.1 checkpoints)?

thank

Ok, given the specs, I’m not surprised about your performances. Sadly, there’s not much that can be done here. It’s a 2015 SoC, 1.2GHz Cortex-A53, and it’s likely it lacks accelerators that TFLite can leverage.

Reducing the model complexity would be the only solution, but that’s a non-trivial task to be able to train the model with similar quality and less nodes.

For example, the built-in recognition in the phone copes with it perfectly. I understand that embedded recognition has advantages over DeepSpeech. But still I want to speed up the recognition.

The quality of recognizable sound will be good, without noise.
How can I speed up recognition? Can I somehow remake a trained model? It does not matter if the quality of the call gets a little worse.
Or do I need to re-train the model?

thank

English is not my native language either, so I do understand the extra mental load.

We can’t know that if you don’t tell us what you find missing / complicated.

I’m sorry but at some point, when it’s obviously documented in the help, there’s nothing more I can do than to tell you to read it. Unless you tell me what is unclear in the doc, I can’t figure it out for you. There’s --checkpoint_dir, --export_dir, that’s all you need. There are several examples of that all throughout the forum.

Are you aware the builtin recognition runs online, while our system runs offline ? At some point, you need to understand that if the model is too complicated to fit in the computing budget of your CPU, there’s nothing that can be trivially done to improve speed.

That does not really matter

The only solution is to reduce the model complexity, n_hidden=2048 with the current models. The temporal complexity of the model is depending quadratically on this. So roughly, if you say it takes 5 seconds to run inference on a 2 seconds audio file, it means you are mostly 2.5x slower than realtime. From there you should be able to infer the complexity of the model you require.

But re-training from scratch with smaller model dimension is going to be a complicated process: you need thousands of hours of audio, and several tentative to adjust the parameters.

In the documentation is a description of the parameter. And there is no example of its use. Or one example is given. It is very small.

How do I re-export control points from the phone model (output_graph.tflite) using this option --checkpoint_dir.

Here is a sample code. How do I correctly specify the file output_graph.tflite to re-export control points for v0.6.0a4

python DeepSpeech.py output_graph.tflite --checkpoint_dir /FOLDER_NAME

Set --n_hidden to 800?
The remaining parameters are correct?

python3 DeepSpeech.py --n_hidden 800 --checkpoint_dir path / to / checkpoint / folder --epochs 3 --train_files my-train.csv --dev_files my-dev.csv --test_files my_dev.csv --learning_rate 0.0001

On which computer and how long can such training take? About. If you use Common Voice data for the English language.
Can be specified in dozens of hours.

Thank.

Well, you should file issue / improve doc if you want.

python DeepSpeech.py --help, you will learn about --export_dir and --export_tflite

You’ll likely need more than just a dozen of hours to get a production-grade model. Also, training will require a couple of powerful GPUs.

For english, we use more than 3000 hours, for instance.