Implementing streaming speech recognition using C++

ntnzelenin · November 1, 2018, 9:07pm

Hi, I’m trying to implement this python code using C++:

subproc = subprocess.Popen(shlex.split('rec -q -V0 -e signed -L -c 1 -b 16 -r 16k -t raw - gain -2'),
                   stdout=subprocess.PIPE,
                   bufsize=0)
try:
while True:
data = subproc.stdout.read(512)
model.feedAudioContent(sctx, np.frombuffer(data, np.int16))

It is from the article https://hacks.mozilla.org/2018/09/speech-recognition-deepspeech/

I’m not very experienced at C++, I tried several approaches.
I tried to use pipe and fgets():

char *buffer = new char[buffer_size];
shared_ptr<FILE> pipe(popen("rec -q -V0 -e signed -L -c 1 -b 16 -r 16k -t raw - gain -2", "r"), pclose);
if (!pipe) throw std::runtime_error("popen() failed!");
while (n < 300) {
    fgets(buffer, buffer_size, pipe.get());
    DS_FeedAudioContent(ctx, (short*) buffer, buffer_size / 2);
}
cout << DS_FinishStream(ctx) << endl;

Data read by this method at least “seems” correct but I get just a random set of letters. And as I know fgets() reads until it encounters eof, so some data could be missed.

Also, I tried to use ifstream:

ifstream is("rec -q -V0 -e signed -L -c 1 -b 16 -r 16k -t raw - gain -2", ifstream::binary);
...
is.read(buffer, buffer_size);

I tried to use alsa to read from mic:

...
int err;
if ((err = snd_pcm_readi (capture_handle, buffer, buffer_size)) !=
  buffer_size) {
    fprintf (stderr, "read from audio interface failed (%d)\n",
          err);
    exit (1);
}
...

I tried to move it all to a separate thread in case it misses some date while feeding audio to model. But nothing worked and DS_FinishStream recognises just some abracadabra
Maybe I’m doing some stupid mistakes. If so, could somebody please indicate what they are. And if these small pieces of code is not enough to understand the problem I’ll provide the rest.

I’m using Ubuntu, I run my program this way:

./build/client --model ../models_0.3/output_graph.pbmm --alphabet ../models_0.3/alphabet.txt --lm ../models_0.3/lm.binary --trie ../models_0.3/trie

I tested python code from the article with the same arguments and it worked fine.
Many thanks in advance

reuben · November 1, 2018, 9:18pm

fgets is for text, since you’re dealing with binary data, use fread.

The context argument to DS_FinishStream should be the same as the one you pass to DS_FeedAudioContent, it’s a handle to the streaming context.

ntnzelenin · November 1, 2018, 9:30pm

Wow! Works like a charm, thanks so much!!

Topic		Replies	Views
Is there any C/C++ example about DeepSpeech? DeepSpeech	1	3043	March 11, 2019
Streaming API on mac os x DeepSpeech	6	1023	May 7, 2019
Issue: Feature request: streaming decoder (fast DS_IntermediateDecode calls) DeepSpeech	3	414	September 4, 2019
Newbie questions: use DeepSpeech for voice transcribe, ds_ctcdecoder, thanks DeepSpeech	2	392	June 30, 2021
SpeechRecognition DeepSpeech	7	359	December 1, 2020

Implementing streaming speech recognition using C++

Related topics