I want to run Deepspeech (native_client/java) on my mobile phone using Android Studio (Win 7 64)

sasha.shukin · July 18, 2019, 5:45pm

Hey.
I want to run Deepspeech (native_client / java) on my mobile phone. Run using Android Studio on Windows 7 (64).

I downloaded https://github.com/mozilla/DeepSpeech/tree/v0.6.0- alpha.4 in the folder D: \ Android \ Projects \ Deepspeech
Executed the command “git clone https://github.com/mozilla/DeepSpeech.git ./”
And did “git checkout v0.6.0-alpha.4”

Then I downloaded https://github.com/mozilla/tensorflow/tree/r1.14. Remote branch “r1.14” in the folder D: \ Android \ Projects \ tensorflow.

Further not entirely clear what to do next?
Can you write in detail what next step to do?
What to install? What to do, how and in what? Etc.

I think then this instruction will help many.

The model I want to use already ready (Deepspeech). While I will not do my.

lissyx · July 18, 2019, 10:30pm

We provide a libdeepspeech JNI bindings that you can use in your application.

We already have docs covering how to rebuild / test as well as gradle stuff to use the bindings.

And there’s code using the bindings in a demo APK in this subdirectory …

github.com

mozilla/DeepSpeech/blob/master/native_client/java/README.md

DeepSpeech Java / Android bindings
==================================

This is still preliminary work. Please refer to `native_client/README.md` for
building `libdeepspeech.so` and `deepspeech` binary for Android on ARMv7 and
ARM64 arch.

Anroid Java / JNI bindings: `libdeepspeech`
===========================================
Java / JNI bindings are available under the `libdeepspeech` subdirectory.
Building depends on prebuilt shared object.  Please ensure to place
`libdeepspeech.so` into the `libdeepspeech/libs/{arm64-v8a,armeabi-v7a}/`
matching subdirectories.

Building the bindings is managed by `gradle` and should be limited to issuing
`./gradlew libdeepspeech:build`, producing an `AAR` package in
`./libdeepspeech/build/outputs/aar/`. This can later be used by other
Gradle-based build with the following configuration:
```
implementation 'deepspeech.mozilla.org:libdeepspeech:VERSION@aar'

This file has been truncated. show original

Everything is here, just read it, apply, and send PR if there are docs improvements required.

lissyx · July 18, 2019, 10:33pm

And it’s available on bintray: https://bintray.com/alissy/org.mozilla.deepspeech/libdeepspeech

So as long as you have jcenter repo configured (which seems to be a default), the implementation line documented is all you need …

sasha.shukin · July 19, 2019, 12:50pm

It worked.

How to speed up the work?
A two-second audio file recognizes 5 seconds.
I need the application to recognize it online. It won’t be scary if the application skips for half a second every 5 seconds.

Are there new models for version “0.6.0-alpha.4” or load old https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.1?

It worked.

Made a new application.
Added line “implementation ‘org.mozilla.deepspeech: libdeepspeech: 0.6.0-alpha.4’” in “dependencies {” to the file “app / build.gradle”.

Partially migrated code from “native_client / java”.
Launched on the phone.
Uploaded model files from https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.1.
Added them to “/ sdcard / deepspeech /” through “Device File Explorer”.

Do not forget to add the line “<uses-android: name =” android.permission.READ_EXTERNAL_STORAGE “/>” in AndroidManifest.xml. And gave permission to the application on the phone.

lissyx · July 19, 2019, 1:12pm

We can’t help you without more context. I can tell you we are 2x faster than real time on Android devices. That all depends on the devices …

Again, without more context on your hardware, I can’t tell if it’s normal or not.

For v0.6.0a4 you need to re-export the 0.5.1 checkpoints.

sasha.shukin · July 19, 2019, 9:37pm

How to speed up the work?
A two-second audio file recognizes 5 seconds.

We can’t help you without more context. I can tell you we are 2x faster than real time on Android devices. That all depends on the devices …

I need the application to recognize it online. It won’t be scary if the application skips for half a second every 5 seconds.

Again, without more context on your hardware, I can’t tell if it’s normal or not.

What you need? Phone model?
Phone model: Huawei ALE-L21.
Here are the characteristics of https://www.gsmarena.com/huawei_p8lite-7201.php (I don’t know how accurate they are).

For v0.6.0a4 you need to re-export the 0.5.1 checkpoints.

And you can detail how to do it (step by step)?

lissyx · July 20, 2019, 7:22am

Please read the documentation ?

lissyx · July 20, 2019, 7:24am

Ok, given the specs, I’m not surprised about your performances. Sadly, there’s not much that can be done here. It’s a 2015 SoC, 1.2GHz Cortex-A53, and it’s likely it lacks accelerators that TFLite can leverage.

Reducing the model complexity would be the only solution, but that’s a non-trivial task to be able to train the model with similar quality and less nodes.

sasha.shukin · July 20, 2019, 8:47pm

I understand that you are already tired here.
For me, English is not my native language and I have to use https://translate.google.com.
And the documentation is very short. It would be nice to expand it with additional examples.

And you can detail how to do it (step by step)?

Please read the documentation ?

Can you still write how to do it? How to create a model for v0.6.0a4 (re-export the 0.5.1 checkpoints)?

thank

sasha.shukin · July 20, 2019, 8:49pm

Ok, given the specs, I’m not surprised about your performances. Sadly, there’s not much that can be done here. It’s a 2015 SoC, 1.2GHz Cortex-A53, and it’s likely it lacks accelerators that TFLite can leverage.

Reducing the model complexity would be the only solution, but that’s a non-trivial task to be able to train the model with similar quality and less nodes.

For example, the built-in recognition in the phone copes with it perfectly. I understand that embedded recognition has advantages over DeepSpeech. But still I want to speed up the recognition.

The quality of recognizable sound will be good, without noise.
How can I speed up recognition? Can I somehow remake a trained model? It does not matter if the quality of the call gets a little worse.
Or do I need to re-train the model?

thank

lissyx · July 22, 2019, 7:07am

English is not my native language either, so I do understand the extra mental load.

We can’t know that if you don’t tell us what you find missing / complicated.

I’m sorry but at some point, when it’s obviously documented in the help, there’s nothing more I can do than to tell you to read it. Unless you tell me what is unclear in the doc, I can’t figure it out for you. There’s --checkpoint_dir, --export_dir, that’s all you need. There are several examples of that all throughout the forum.

lissyx · July 22, 2019, 7:11am

Are you aware the builtin recognition runs online, while our system runs offline ? At some point, you need to understand that if the model is too complicated to fit in the computing budget of your CPU, there’s nothing that can be trivially done to improve speed.

That does not really matter

The only solution is to reduce the model complexity, n_hidden=2048 with the current models. The temporal complexity of the model is depending quadratically on this. So roughly, if you say it takes 5 seconds to run inference on a 2 seconds audio file, it means you are mostly 2.5x slower than realtime. From there you should be able to infer the complexity of the model you require.

But re-training from scratch with smaller model dimension is going to be a complicated process: you need thousands of hours of audio, and several tentative to adjust the parameters.

sasha.shukin · July 27, 2019, 3:11pm

In the documentation is a description of the parameter. And there is no example of its use. Or one example is given. It is very small.

How do I re-export control points from the phone model (output_graph.tflite) using this option --checkpoint_dir.

Here is a sample code. How do I correctly specify the file output_graph.tflite to re-export control points for v0.6.0a4

python DeepSpeech.py output_graph.tflite --checkpoint_dir /FOLDER_NAME

sasha.shukin · July 27, 2019, 3:19pm

Set --n_hidden to 800?
The remaining parameters are correct?

python3 DeepSpeech.py --n_hidden 800 --checkpoint_dir path / to / checkpoint / folder --epochs 3 --train_files my-train.csv --dev_files my-dev.csv --test_files my_dev.csv --learning_rate 0.0001

On which computer and how long can such training take? About. If you use Common Voice data for the English language.
Can be specified in dozens of hours.

Thank.

lissyx · July 27, 2019, 6:33pm

Well, you should file issue / improve doc if you want.

python DeepSpeech.py --help, you will learn about --export_dir and --export_tflite

You’ll likely need more than just a dozen of hours to get a production-grade model. Also, training will require a couple of powerful GPUs.

For english, we use more than 3000 hours, for instance.