Install DeepSpeech programmatically on Windows

Say I want to install Python + DeepSpeech on a Windows machine without the user having to do anything. How would I do this? Could I not just unpack an archive containing the Python interpreter and the deepspeech module to some directory? Or am I overlooking something?

(Sorry, I’m not a Python expert… )

Many greetings

Inference ? Training ?

I need just inference, not training

You don’t need to use the python bindings, you can use others as well, including .Net or C++. What are you building exactly ?

Oh, yeah that sounds better. I have a Java application that just wants to run audio samples through a standard English DeepSpeech model.

So, we have Java bindings, but they are currently limited to Android. We would welcome any effort to make that not limited to Android, but we are not java-ists, so we need some help there :slight_smile:

I see, thanks. Making a Java binding sounds complicated, I don’t know if I’ll have the time…

Can’t we make a simple deepspeech-recognize.exe using the C bindings that you pass an audio file to on the command line?

The Java bindings already exists, it’s just that we only build and ship them on Android, and thus we don’t know if / how much it works on other platforms.

You can, but you will have much less efficient use of the API …

Again what are you building exactly ?

I am building desktop chat bots that are supposed to constantly listen to speech input. I have a (rudimentary) VAD and send audio clips to either wit.ai or locally installed speech engines.

You can, but you will have much less efficient use of the API …

You mean performance-wise?

Well, kind of. With your description, it makes 100% sense to use streaming in the API, which is going to be super-inefficient if you constantly fork a new process.

IMHO you’ll spend more time writing your C / C++ small binary and making that work efficiently than help us on the Java binding and directly integrate properly into your app.

OK, I am compiling the Java bindings on Linux (my development platform) as a first test. This is what I get:

A problem occurred configuring project ‘:app’.

SDK location not found. Define location with sdk.dir in the local.properties file or with an ANDROID_HOME environment variable.

Well, I told you we only support and build targetting Android … That’s the point where we need help … Last time I did non Android java code was 10+ years ago.

I see. Well, I am still in favor of the “deepspeech.exe” idea. It could be a process that keeps running, receives audio file paths on STDIN and outputs the results on STDOUT.

I can’t force you, but as I said this is going to be less than suboptimal, complicated, and inefficient. We could really use the help from people who are at ease with Java …

Not sure what would be inefficient about it? Don’t see why.

I am at ease with Java but I normally use my own build tools (not Gradle)…

Because you have to re-expose the streaming API over stdin/stdout ? Tell me how not inefficient that is going to be ?

Please, what do you not understand with we need your help because we only support Android. Nobody said that outside of Android you should be forced to use gradle: I DONT KNOW THE JAVA ECOSYSTEM.

STDIN/STDOUT should be trivial in overhead compared to the actual recognition…

OK, I get it, you want help with Java :slight_smile: I’m not a JNI guru either, so I’m not sure I’m the right guy. Maybe I’ll have a flash of motivation and try to sort this out.

Thank you for all your answers.

https://deepspeech.readthedocs.io/en/latest/Java-API.html

I’m not talking about temporal or spatial complexity, but about the mess to re-implement API over stdin/stdout.

It’s your project, but it will also be another extra layer painful to deal with if you have to debug recognition itself.

Seriously, the JNI part should not be different. It’s really about building / packaging / using outside of Android.

Well, there are other considerations too… JNI is risky as it can crash the JVM if anything is implemented badly. External processes ease the mind.