I’m building a VR experience for a trade show in July, and part of the experience will involve a conversation with someone on a smart phone (e.g. https://www.talktothereasons.com). I’d love to use DeepSpeech in a conversation engine for this - listening to the mic and letting the user say one of two choices. I think the current trained English model would work just fine - I can use a fuzzy string comparison on the predetermined answer and thus make the choice. I’ve already tested 0.4.1 on a Ubuntu VM and it performs well. But we’ll need to get the client built for Windows to move forward.
I’d be happy to use 0.5.0 as that already has a Windows client, but unfortunately it looks like ya’ll don’t have a trained model ready yet. So I’ll need to do my best to compile 0.4.1 on Windows using Visual Studio.
I’ll be carefully following this guide, as well as the thread here. Hopefully it will be as simple as grabbing the solution from 0.5.0 and compiling the 0.4.1 code with it, but I know better than to be so optimistic.
This seems like a really friendly and helpful community of developers here. I’ll do my best to give a good faith effort, and I hope y’all won’t mind giving the odd tip if I run into trouble. Thanks in advance!
A few questions on your client. It’s actually great for me to use the precompiled binary, as that really is all I need. But a few things aren’t as expected.
When I run “DeepSpeechConsole.exe”, the process hangs after the output. It never terminates. This seems a bit unusual.
I do get a transcription back, but it’s not the same one I get on the linux version of 0.4.1 - it seems poorer quality. Any idea what would cause that?
It may still be fine for my needs, but I thought I’d ask.
OK - that confirms my issues on both counts. Neither of them are necessarily deal-breakers, though it’s strange to see a quality drop. It was the same wav file. I wonder why that is?