Implementing streaming speech recognition using C++

Hi, I’m trying to implement this python code using C++:

subproc = subprocess.Popen(shlex.split('rec -q -V0 -e signed -L -c 1 -b 16 -r 16k -t raw - gain -2'),
                   stdout=subprocess.PIPE,
                   bufsize=0)
try:
while True:
data = subproc.stdout.read(512)
model.feedAudioContent(sctx, np.frombuffer(data, np.int16))

It is from the article https://hacks.mozilla.org/2018/09/speech-recognition-deepspeech/

I’m not very experienced at C++, I tried several approaches.
I tried to use pipe and fgets():

char *buffer = new char[buffer_size];
shared_ptr<FILE> pipe(popen("rec -q -V0 -e signed -L -c 1 -b 16 -r 16k -t raw - gain -2", "r"), pclose);
if (!pipe) throw std::runtime_error("popen() failed!");
while (n < 300) {
    fgets(buffer, buffer_size, pipe.get());
    DS_FeedAudioContent(ctx, (short*) buffer, buffer_size / 2);
}
cout << DS_FinishStream(ctx) << endl;

Data read by this method at least “seems” correct but I get just a random set of letters. And as I know fgets() reads until it encounters eof, so some data could be missed.

Also, I tried to use ifstream:

ifstream is("rec -q -V0 -e signed -L -c 1 -b 16 -r 16k -t raw - gain -2", ifstream::binary);
...
is.read(buffer, buffer_size);

I tried to use alsa to read from mic:

...
int err;
if ((err = snd_pcm_readi (capture_handle, buffer, buffer_size)) !=
  buffer_size) {
    fprintf (stderr, "read from audio interface failed (%d)\n",
          err);
    exit (1);
}
...

I tried to move it all to a separate thread in case it misses some date while feeding audio to model. But nothing worked and DS_FinishStream recognises just some abracadabra
Maybe I’m doing some stupid mistakes. If so, could somebody please indicate what they are. And if these small pieces of code is not enough to understand the problem I’ll provide the rest.

I’m using Ubuntu, I run my program this way:

./build/client --model ../models_0.3/output_graph.pbmm --alphabet ../models_0.3/alphabet.txt --lm ../models_0.3/lm.binary --trie ../models_0.3/trie

I tested python code from the article with the same arguments and it worked fine.
Many thanks in advance :slight_smile:

2 Likes

fgets is for text, since you’re dealing with binary data, use fread.

The context argument to DS_FinishStream should be the same as the one you pass to DS_FeedAudioContent, it’s a handle to the streaming context.

1 Like

Wow! Works like a charm, thanks so much!!