Compiling the 0.4.1 Client on Windows

(Giff) #1

Hi Everyone,

I’m building a VR experience for a trade show in July, and part of the experience will involve a conversation with someone on a smart phone (e.g. https://www.talktothereasons.com). I’d love to use DeepSpeech in a conversation engine for this - listening to the mic and letting the user say one of two choices. I think the current trained English model would work just fine - I can use a fuzzy string comparison on the predetermined answer and thus make the choice. I’ve already tested 0.4.1 on a Ubuntu VM and it performs well. But we’ll need to get the client built for Windows to move forward.

I’d be happy to use 0.5.0 as that already has a Windows client, but unfortunately it looks like ya’ll don’t have a trained model ready yet. So I’ll need to do my best to compile 0.4.1 on Windows using Visual Studio.

I’ll be carefully following this guide, as well as the thread here. Hopefully it will be as simple as grabbing the solution from 0.5.0 and compiling the 0.4.1 code with it, but I know better than to be so optimistic.

This seems like a really friendly and helpful community of developers here. I’ll do my best to give a good faith effort, and I hope y’all won’t mind giving the odd tip if I run into trouble. Thanks in advance!

0 Likes

(Carlos Fonseca) #2

Hello @thegiffman,

You can start testing with my fork, I’m almost sure it was 0.4.1 on master, and use branch r1.12 of Mozilla’s Tensorflow.

TF : https://github.com/mozilla/tensorflow/tree/r1.12
My fork with 0.4.1: https://github.com/carlfm01/DeepSpeech

It should work following the same guide you mentioned. Let me know if it works.Any feedback well appreciated.

0 Likes

How can I use the pre-trained model with alpha versions?
(Carlos Fonseca) #3

If you need just the client and not to compile and modify it you can test with my old clients.

If you need you use the .NET client with 0.4.1 please use it from :https://github.com/carlfm01/DeepSpeech.

0 Likes

(Giff) #4

Works great! Exactly what I need to save me days of work and uncertainty.

Thanks a million. Can I buy you a drink or something? :wink:

1 Like

(Giff) #5

Hi @carlfm01,

A few questions on your client. It’s actually great for me to use the precompiled binary, as that really is all I need. But a few things aren’t as expected.

  1. When I run “DeepSpeechConsole.exe”, the process hangs after the output. It never terminates. This seems a bit unusual.
  2. I do get a transcription back, but it’s not the same one I get on the linux version of 0.4.1 - it seems poorer quality. Any idea what would cause that?

It may still be fine for my needs, but I thought I’d ask.

0 Likes

(Carlos Fonseca) #6

Are you using the compiled .NET client? I’m afraid that you will need to compile with the current master to avoid:

Using the same format? 16kHz, 16 bit depth and mono?

And yes, the Windows client is just a little worse, almost 1% worse.

Did the test here:

0 Likes

(Giff) #7

OK - that confirms my issues on both counts. Neither of them are necessarily deal-breakers, though it’s strange to see a quality drop. It was the same wav file. I wonder why that is?

0 Likes

(Carlos Fonseca) #8

Please check the BEAM_WIDTH is the same for both.

0 Likes

(Carlos Fonseca) #9

This an issue, also upstream is set to 200.

0 Likes