Hey guys! First of all thanks to all the community which is amazing! I have found many tutorials and tips.
I have been developing a PTT app which is used to recognize voice commands when a person is holding a button. I use the NodeJs example to start my app. At this point I have a good recognizer but the process time of this method is too long:
I generate my own lm and trie with the short commands. I’m using CPU and for chunks with 8KBytes (not 8 MBytes) is using 3.5 seconds. My questions are these:
How many seconds/milliseconds can I improve using a GPU (GTX-2070)?
Can I speed up the time with any command or argument? I did not see anything in the documentation.
The new version 0.7 is faster than the 0.6?
Any tip?
Thanks again!
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
8 Mbytes of audio in 3.5 seconds ? How much time is that ? I feel like it’s much more than realtime.
Maybe not that much, depending on where the bottleneck is. GPU itself will produce inference much faster, but you still have some CPU-bound operations, plus the memory transfers.
Could you please share more context on:
what you are working on,
your expectations in processing time
have you tried changung the chunk size ? 8MB is really quite a lot compared to what we usually do, especially for streaming.
It is a simple audio command to a video-game. Such us ‘attack tower’, ‘spawn warrior’…
Right now my expectations are to reduce the time to one second or less.
I have tried to reduce the chunk size using the mic lib (https://www.npmjs.com/package/mic) with arecord flag --buffer-size=1048 but I do not success. I had to look inside the lib but I think what am doing is not correct:
I too found that feedAudioContent is slow in Nodejs client lib. While trying out this deepspeech node angular UI app I found that CPU usage is quite high during inference… Upon profiling I found that feedAudioContent is taking lot of time. It was little high on mac, but was sucking CPU on Linux.
The CPU usage does go high while inferencing. I think it’s the same feedAudioContent. I found this only in node. I tried python example code, they were not having this issue.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
7
Please be more descriptive of the issue. This is completely not what we experience usign the Streaming API under NodeJS (daily, on RPi4).
So far, the only figures in this threads points to misuse of the API.
You are right, but it is not the full audio. The audio is received by a callback and each time it is recieving I process with that function. When the last sound arrives, I call the let text = englishModel.finishStream(modelStream);
and I get the full text.
I followed the example found in the DeepSpeech examples repo. It is very similar. I have tried to get the main methods used by the lib in which I have the problem.
When the mic is capturing information, the event .on("data") is receiving the audio buffer data. When the stream has finished is called the .on("finish"). At this time I decode the audio. DeepSpeech usually works pretty fine.
Usually when I release the PTT button the last words are not in the model and I have to wait an extra time to process the data with the method: englishModel.feedAudioContent.
320ms is the optimal chunk size for latency as it matches how often the model can process input. Chunk sizes smaller than that make no difference, it’ll still be processed only after 320ms of audio are accumulated. Chunk sizes larger than that will increase latency.
1 Like
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
17
Could you please be more descriptive ? What’s “globally better” ?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
18
What’s your NodeJS version as well ?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
19
That seems quite low end, but it should not be that slow.
Could you please reproduce speed measurement using:
nodejs bindings,
python bindings
C++ bindings
Using this:
$ npm install deepspeech@0.6.1
$ time deepspeech [...]
$ pip3 install deepspeech==0.6.1
$ time deepspeech [...]
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/native_client.amd64.cpu.linux.tar.xz && tar xf native_client.amd64.cpu.linux.tar.xz
$ time ./deepspeech [...]
The time since I start to send chunks to the : englishModel.feedAudioContent(modelStream, chunk.slice(0, chunk.length / 2));
until I process all the chunks received is lower.