Using deep speech in windows environment

Thank you, that was exactly it, after quite a while it worked!
Fixed some config issues on visual studio projects and its now doing recognizing, although, it takes up 2gb ram and it fails to recognize a lot of words
I tested by recording audio from a input device from a show The Office, language english american accent, it fails about 20%/40% of the words easily

You mean from the speaker using the micro or from the windows output? WPF or console? test with librivox recordings. The Windows solutions proven to score WER 8.87%

Using WPF tried from windows output, still trying to fix sound from microphone
If i play the arctic_a0024.wav it did it perfectly, but it was a perfect recording which is not possible in my application which will be recording from a microphone
Also, it takes quite a while to transcribe from real-live audio is this normal behavior?

Remember that the model is not good at handling noise yet, maybe the audio contains laughs or claps?

Yes if you disabled avx2

Yes it does have laughs claps, random background noise which really hurts the recognition, on a clean audio its pretty good
Unfortunately my cpu doesn’t have avx2
Would the dataset from mozilla common voice be better at handling background noise?

I know is trained using common voice data, but not sure if the data used contains noise.
Questions about speech corpora for pre-trained model

Interesting, going to keep at but i don’t think this is a viable solution to me at this moment due it picking up a lot of background noise although it is very promising.
I will dedicate this week to try fine tune, trying noise gates tomorrow.

Thanks a lot Carlos Fonseca, um abraço.

FYI windows builds on TaskCluster got merged this morning, I’ll add upload to NuGet Gallery soon …

@Bazellete You should now be able to use prebuilt binaries from nuget.org. Please test, it’s still brand new.

Please define it takes quite a while, it’s not very clear …

For what it’s worth, I’ve found DeepSpeech performs decently on interviews, news broadcasts, documentaries, etc where speech tends to be natural, but very poorly in situations with “acted speech” such as dramas or sitcoms. I think it’s because most people are recording in a neutral, non-emotional tone.

1 Like

lissyx they worked after a few issues, much much easier than compiling the whole thing, thanks
my bad about taking a while, for some reason i thought the wpf example would transcribe in real-time but you need to press stop to actually transcribe
Yes dabinat, it does work quite well on those scenarios, am trying to use voicemeter banana and audiocity trying to minimize background noise, having some success still not good enough

The native client doesn’t supports stream decoding yet, I think somewhere I read that @reuben was hitting issues trying to make a streaming decoder.

Could you give more informations ?

Had to install .net 4.6.2(not really an issue)
References were not added automatically
There was something else i don’t remember, will edit post if i remember

please send patches cc @carlfm01

I don’t know what you mean, you mean when you install the nuget the DeepSpeechClient.dll not being added to the references or what? Please explain a little more.

Exactly that, installed the nuget package and it wasn’t added to the references, had to manually make the reference

Working on it, I’ll add 4.5,4.6 and 4.7

1 Like

It’s landed but it broke :smiley: