Compiling for Windows

Awesome! Looks like it might interest @reuben :slight_smile:

Nice! How did you build without removing the mmap code?

I removed empty selects for Windows in tensorflow/code/BUILD here

I’m reading the python, and node clients in order to understand more how the buffer should be passed to C++.

Little step, now I’m getting:

TensorFlow: b’unknown’
DeepSpeech: v0.4.0-alpha.0-54-gccd75f2
TensorFlow: b’unknown’
DeepSpeech: v0.4.0-alpha.0-54-gccd75f2
TF result code : 0 Inference time 17s Result: a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a

It’s possible that mixing version of the model and lm.binary can cause weird outputs like this? I have to use the lm.binary from master and the model from Deep Speech 0.3.0 . I’m worried about the inference time for 3s audio, in the configuration of tensorflow I disabled jit, I’ll rebuild with jit enabled and see if it get better inference times.

Did you made sure you re-exported the model from checkpoints of v0.3.0 ? Your binaries built from master will rely on the new ctcdecoder, and that model was not exported with it, so you would get weird outputs like that.

Make sure you enabled all SSE variants, as well as AVX / AVX2 / FMA as much as your CPU supports.

1 Like

FWIW I was able to build the C++ client (native_cient/client.cc) by removing the libsox usage as well as the getopt stuff in args.h and linking against libdeepspeech.lib as well as the TensorFlow libs. Just feeding samples directly from the WAV file to the model worked fine with proper results. So I assume your problem is the mismatched model and code. You can get a master-ready model from here: https://github.com/reuben/DeepSpeech/releases/tag/v0.2.0-prod-ctcdecode

1 Like

Thanks @reuben it worked with your model!!!

I don’t know why I can’t upload images :confused:

Result: “and you always want to see it in the superlative degree of”
Actual audio “and you always want to see it in the superlative degree”

Still compiling and trying to enable all my available instructions, bazel is ignoring my commands to enable sse and fma, I’ll post results soon.

On my laptop with a i5-7200u 2.5Ghz, I enabled avx and avx2 and now is taking 3s for 6s audio with 30% peak of processor usage and just 30MB of RAM usage :D,my desktop is a AMD FX 8350 which does not support avx2 :confused:, I’ll test a little more with different instructions for my amd.

so twice faster than realtime, right? 30MB seems too low, are you measuring properly ?

Yes with DecoderWithLM disabled, 2x faster than real time on my laptop and just 30MB, In this article reuben mention that the usage was down from 4GB to 20MB which makes sense because my 30MB is including the views of the basic .net app, amazing.

If I enable DecoderWithLM the RAM usage goes to 2GB and the time to transcribe decreases a little, like 2.3x times faster than real time with the same CPU usage.

Different thing is happening for my AMD, in the first attemp it was taking 17s for 3s audio, now is taking 8s for 6s audio, it is faster with avx disabled I don’t know why, was pure luck the way I find out that is faster with avx disabled,I didn’t notice that in the build said avx was ignored, then when I was testing showed the warning about cpu avx arch supported but the time decreased a lot.

Here’s few audios using my AMD FX8350 without avx and DecoderWithLM disabled
Audio info: sr 16000, mono, 16bit

Inference time 00:00:07.9358317s Result: once there was a young rat named darthur who never could make up it his mind Audio duration: 00:00:06.2500000s

Correct: Once there was a young rat named Arthur who never could make up his mind.

Inference time 00:00:06.8116012s Result: whenever his friends asked him if he would like to go ou with them Audio duration: 00:00:05.3750000s

Correct: Whenever his friends asked him if he would like to go out with them,

Inference time 00:00:08.2685040s Result: he would only answer i don’t know he wouldn’t say yes or kno either Audio duration: 00:00:06.5000000s

Correct: he would only answer, “I don’t know;” he wouldn’t say yes or no either.

Inference time 00:00:07.4153615s Result: he would always shirk make in a choice his aunt telen said to him Audio duration: 00:00:05.8750000s

Correct: He would always shirk making a choice. His Aunt Helen said to him,

Inference time 00:00:08.4571579s Result: now look here no one is going to care for you if you carry on like this Audio duration: 00:00:06.6250000s

Correct: “Now look here! No one is going to care for you if you carry on like this.”

Inference time 00:00:05.7075355s Result: you have no more mind than a blade of grass Audio duration: 00:00:04.5000000s

Correct: You have no more mind than a blade of grass

Inference time 00:00:06.8209409s Result: one rainy day the rats heard a great noise in the loft Audio duration: 00:00:05.3750000s

Correct: One rainy day the rats heard a great noise in the loft.

Inference time 00:00:06.4166841s Result: since read and green light when mixed formello Audio duration: 00:00:05.1250000s

Correct: since red and green light when mixed form yellow.

Inference time 00:00:08.0259637s Result: this is a very common type of bol one jhoin mainly read and yellow Audio duration: 00:00:06.3750000s

Correct: This is a very common type of bow, one showing mainly red and yellow,

Inference time 00:00:05.1169112s Result: with little or no green or blue Audio duration: 00:00:04.1250000s

Correct: with little or no green or blue.

AMD FX8350 without avx and DecoderWithLM enabled, 2GB RAM usage

Inference time 00:00:07.1140949s Result: once there was a young rat named arthur who never could make up his mind Audio duration: 00:00:06.2500000s

Inference time 00:00:06.1330356s Result: whenever his friends asked him if he would like to go out with them Audio duration: 00:00:05.3750000s

Inference time 00:00:07.3586543s Result: he would only answer i don’t know he wouldn’t say yes or no either Audio duration: 00:00:06.5000000s

Inference time 00:00:06.6646632s Result: he would always shirk making a choice his aunt ellen said to him Audio duration: 00:00:05.8750000s

Inference time 00:00:07.5113763s Result: now look here no one is going to care for you if you carry on like this Audio duration: 00:00:06.6250000s

Inference time 00:00:05.1292894s Result: you have no more mind than a blade of grass Audio duration: 00:00:04.5000000s

Inference time 00:00:06.0509971s Result: one rainy day the rat heard a great noise in the loft Audio duration: 00:00:05.3750000s

Inference time 00:00:05.6844169s Result: since red and green light when mixed for yellow Audio duration: 00:00:05.1250000s

Inference time 00:00:07.1047552s Result: this is a very common type of bow one chainmail red and yellow Audio duration: 00:00:06.3750000s

Inference time 00:00:04.5409824s Result: with little or no green or blue Audio duration: 00:00:04.1250000s


The results for my laptop with i5-7200u 2GB RAM usage with DecoderWithLM enabled, AVX and AVX2 enabled

Inference time 00:00:03.1217653s Result: once there was a young rat named arthur who never could make up his mind Audio duration: 00:00:06.2500000s

Inference time 00:00:02.6623424s Result: once there was a young rat named arthur who never could make up his mind Audio duration: 00:00:06.2500000s

Inference time 00:00:02.2677272s Result: whenever his friends asked him if he would like to go out with them Audio duration: 00:00:05.3750000s

Inference time 00:00:02.7466476s Result: he would only answer i don’t know he wouldn’t say yes or no either Audio duration: 00:00:06.5000000s

Inference time 00:00:02.4971673s Result: he would always shirk making a choice his aunt ellen said to him Audio duration: 00:00:05.8750000s

Inference time 00:00:02.8055185s Result: now look here no one is going to care for you if you carry on like this Audio duration: 00:00:06.6250000s

Inference time 00:00:01.8914715s Result: you have no more mind than a blade of grass Audio duration: 00:00:04.5000000s

Inference time 00:00:02.2422975s Result: one rainy day the rat heard a great noise in the loft Audio duration: 00:00:05.3750000s

Inference time 00:00:02.1280377s Result: since red and green light when mixed for yellow Audio duration: 00:00:05.1250000s

Inference time 00:00:02.6395063s Result: this is a very common type of bow one chainmail red and yellow Audio duration: 00:00:06.3750000s

Inference time 00:00:01.6756065s Result: with little or no green or blue Audio duration: 00:00:04.1250000s

Impressive results :slight_smile: , hopefully soon I’ll have time to finish the streaming mode and play with transcriptions using the Windows audio output

Update:

Having a hard time with the smart pointer ctx and the parameter(StreamingState** retval) from C# used on SetupStreaming, I think I’ll have to do few changes to DeepSpeech.cc

@carlfm01 I think it would be great if you post a concise guide on how you managed to compile on Windows, versions used, commands run etc.

I suspect that as usual, it’s the same dependencies as TensorFlow ?

Hi, yes I’ll add the docs for compiling from source, I’m waiting for Tensorflow upstream response on a current issue that causes the compilation fails on CUDA compute 6.0+.
With https://github.com/mozilla/tensorflow/pull/94 and a patch did the trick, but need to wait to upstream response then I add the way we did it to the docs.

After reading this thread I managed to duplicated Carlos process and compile deepspeech using bazel 0.15 by editing the Tensorflow BUILD file.

I have a question, has anyone managed to make install deepspeech subsequently? I have not been able to make deepspeech or build the python bindings.

On the mingW prompt I just get the message:

$ make deepspeech
make: ‘deepspeech’ is up to date.

I’m glad to read that you succesfully built it, I’ve tried with SWIG but failed, errors related to missing .i, then I stopped spending time on it. Maybe someone with more experience can help us on it.
@lissyx I think is time to review the changes of my Tensorflow PR, no response from upstream and looks like the Windows BUILD is getting attention, what you think?

Hi Carlos,

Thanks for the reply. May I ask how you tested the compiled result? Just by directly importing the libdeepspeech.so file into a C++ script? Would you happen to have a working example you could share?

Thank you,

Kuhan

I won’t do so until I have time to actually do the windows builds, sorry