Compiling the 0.4.1 Client on Windows

thegiffman · April 16, 2019, 5:54pm

Hi Everyone,

I’m building a VR experience for a trade show in July, and part of the experience will involve a conversation with someone on a smart phone (e.g. https://www.talktothereasons.com). I’d love to use DeepSpeech in a conversation engine for this - listening to the mic and letting the user say one of two choices. I think the current trained English model would work just fine - I can use a fuzzy string comparison on the predetermined answer and thus make the choice. I’ve already tested 0.4.1 on a Ubuntu VM and it performs well. But we’ll need to get the client built for Windows to move forward.

I’d be happy to use 0.5.0 as that already has a Windows client, but unfortunately it looks like ya’ll don’t have a trained model ready yet. So I’ll need to do my best to compile 0.4.1 on Windows using Visual Studio.

I’ll be carefully following this guide, as well as the thread here. Hopefully it will be as simple as grabbing the solution from 0.5.0 and compiling the 0.4.1 code with it, but I know better than to be so optimistic.

This seems like a really friendly and helpful community of developers here. I’ll do my best to give a good faith effort, and I hope y’all won’t mind giving the odd tip if I run into trouble. Thanks in advance!

carlfm01 · April 16, 2019, 7:18pm

Hello @thegiffman,

You can start testing with my fork, I’m almost sure it was 0.4.1 on master, and use branch r1.12 of Mozilla’s Tensorflow.

TF : https://github.com/mozilla/tensorflow/tree/r1.12
My fork with 0.4.1: https://github.com/carlfm01/DeepSpeech

It should work following the same guide you mentioned. Let me know if it works.Any feedback well appreciated.

carlfm01 · April 16, 2019, 7:22pm

If you need just the client and not to compile and modify it you can test with my old clients.

If you need you use the .NET client with 0.4.1 please use it from :https://github.com/carlfm01/DeepSpeech.

thegiffman · April 16, 2019, 7:34pm

Works great! Exactly what I need to save me days of work and uncertainty.

Thanks a million. Can I buy you a drink or something?

thegiffman · April 17, 2019, 1:48pm

Hi @carlfm01,

A few questions on your client. It’s actually great for me to use the precompiled binary, as that really is all I need. But a few things aren’t as expected.

When I run “DeepSpeechConsole.exe”, the process hangs after the output. It never terminates. This seems a bit unusual.
I do get a transcription back, but it’s not the same one I get on the linux version of 0.4.1 - it seems poorer quality. Any idea what would cause that?

It may still be fine for my needs, but I thought I’d ask.

carlfm01 · April 17, 2019, 5:08pm

Are you using the compiled .NET client? I’m afraid that you will need to compile with the current master to avoid:

Using the same format? 16kHz, 16 bit depth and mono?

And yes, the Windows client is just a little worse, almost 1% worse.

Did the test here:

thegiffman · April 17, 2019, 6:23pm

OK - that confirms my issues on both counts. Neither of them are necessarily deal-breakers, though it’s strange to see a quality drop. It was the same wav file. I wonder why that is?

carlfm01 · April 17, 2019, 7:47pm

Please check the BEAM_WIDTH is the same for both.

github.com

carlfm01/DeepSpeech/blob/35ebcd2075fc60fa9b783ab377b9f51092a29191/examples/net_framework/CSharpExamples/DeepSpeechConsole/Program.cs#L41


{
    model = GetArgument(args, "--model");
    alphabet = GetArgument(args, "--alphabet");
    lm = GetArgument(args, "--lm");
    trie = GetArgument(args, "--trie");
    audio = GetArgument(args, "--audio");
}


const uint N_CEP = 26;
const uint N_CONTEXT = 9;
const uint BEAM_WIDTH = 200;
const float LM_ALPHA = 0.75f;
const float LM_BETA = 1.85f;


Stopwatch stopwatch = new Stopwatch();


using (IDeepSpeech sttClient = new DeepSpeech())
{
    var result = 1;
    Console.WriteLine("Loading model...");
    stopwatch.Start();

github.com

carlfm01/DeepSpeech/blob/35ebcd2075fc60fa9b783ab377b9f51092a29191/examples/vad_transcriber/wavTranscriber.py#L21


@param alphabet: Alphabet.txt file
@param lm: Language model file
@param trie: Trie file


@Retval
Returns a list [DeepSpeech Object, Model Load Time, LM Load Time]
'''
def load_model(models, alphabet, lm, trie):
    N_FEATURES = 26
    N_CONTEXT = 9
    BEAM_WIDTH = 500
    LM_ALPHA = 0.75
    LM_BETA = 1.85


    model_load_start = timer()
    ds = Model(models, N_FEATURES, N_CONTEXT, alphabet, BEAM_WIDTH)
    model_load_end = timer() - model_load_start
    logging.debug("Loaded model in %0.3fs." % (model_load_end))


    lm_load_start = timer()
    ds.enableDecoderWithLM(alphabet, lm, trie, LM_ALPHA, LM_BETA)

carlfm01 · April 17, 2019, 7:50pm

This an issue, also upstream is set to 200.

github.com

mozilla/DeepSpeech/blob/1e601d5c4aea0c7bc9f853f313593275945393e6/native_client/dotnet/DeepSpeechConsole/Program.cs#L41


{
    model = GetArgument(args, "--model");
    alphabet = GetArgument(args, "--alphabet");
    lm = GetArgument(args, "--lm");
    trie = GetArgument(args, "--trie");
    audio = GetArgument(args, "--audio");
}


const uint N_CEP = 26;
const uint N_CONTEXT = 9;
const uint BEAM_WIDTH = 200;
const float LM_ALPHA = 0.75f;
const float LM_BETA = 1.85f;


Stopwatch stopwatch = new Stopwatch();


using (IDeepSpeech sttClient = new DeepSpeech())
{
    var result = 1;
    Console.WriteLine("Loading model...");
    stopwatch.Start();