How to use DeepSpeech for a text-to-speech server (in NodeJs)

solyarisoftware · May 2, 2021, 6:13pm

Hi all,

I tested DeepSpeech Nodejs binding, setting up a small opensource project to transcript speech file:

That’s fine for single-users (one request at time) / standalone applications running on a host with few CPU cores (I tested on my multi-core laptop).

But reading docs: https://deepspeech.readthedocs.io/en/r0.9/NodeJS-API.html I don’t understand how DeepSpeech behave under the hood, in terms of multithreading.

My questions:

Are the API functions detailed in the mentioned documentation “synchronous” (in Javascript language terms)?

In other terms: Does the function Model.stt(aBuffer) blocks the nodeJS main thread? Yes/No?
On a CPU multi-core (without GPU), how could I set up a server architecture (for server I mean any system able to fullfill concurrent requests, exploiting all cores available)?
If all functions are syncronous, I can try to create a NodeJs multi-thread server using “worker threads”. Makes sense?

Thanks
Giorgio

JGKK · May 3, 2021, 5:45am

I found that the wav inference with stt indeed blocks the node.js event loop for the inference duration so nothing else will be happening while transcribing.
The same is not true for the stream class. The transcribe stream function is by its nature non blocking. So if you plan on transcribing multiple things at the same time i would recommend to look at the streaming transcription.
At least thats my experience. But worker threads should work too in theory.

Johannes

lissyx · May 3, 2021, 7:40am

Yes, it’s the same everywhere, it’s blocking the process. We don’t want to bake complicated threading support for this at the library level.

While it’s blocking, it can sill make use of several threads on both implem: tensorflow and tflite

Make several process and that’s it. But as I said, the library is already able to leverage several threads anyway.

solyarisoftware · May 3, 2021, 9:54am

Thanks all

@JGKK
Today I did some tests,
See simple script testPerformances.sh that roughly measures latencies/resources usage and cpu cores usage.

I’m perplexed about results:

When I run deepspeech_cli.sh command line program , I see > 200% cpu for 1.72 secs and 2.176 Mb of RAM consumption (If I well understand):

	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.72
	Maximum resident set size (kbytes): 2176

Notes:

1.72 seems to me a super fast latency, also because deepspeech cli program has to load in memory the model.

BTW I noticed that the “first time” the program is invoked it take > 21 seconds. For each call after, it’s really fast. I do not understand how thinks behave. It seems to me that the the model is loaded in memory as a shared library. It’s like that? If yes is a good news (see later).
If I run
```
/usr/bin/time --verbose pidstat 1 -u -e node deepSpeechTranscriptSpawn.js
```
where deepSpeechTranscriptSpawn.js is a nodejs program that just spawn a process that run previous deepspeech cli program, in this case pidstat measures just 4/5% CPU (instead of expected 200%). Here I do not understand.
Unfortunately just today, I was no more able to run NodeJs binding (using my simple wrapper deepSpeechTranscriptNative.js beacuse the program crashes. That’s weird because I ran succesfully the code one month ago. What changed is that i updated NodeJs to last version v16.0.0.
Ok that’s another story.

UPDATE:
I just opened the issue: https://github.com/mozilla/DeepSpeech/issues/3642

See below full stdout:

testPerformances.sh
test elapsed time / resopurce (using /urs/bin/time)
and CPU cores usage (using pidstat) in different cases

Test 1: deepspeech cli command

Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 	03/05/2021 	_x86_64_	(8 CPU)
Loading model from file models/deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-05-03 10:34:49.272662: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.0153s.
Loading scorer from files models/deepspeech-0.9.3-models.scorer
Loaded scorer in 0.000314s.
Running inference.

10:34:49      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
10:34:50     1000    156502  159,00   54,00    0,00    0,00  213,00     2  deepspeech
why should one halt on the way
Inference took 1.454s for 2.735s audio file.

Average:     1000    156502  159,00   54,00    0,00    0,00  213,00     -  deepspeech
	Command being timed: "pidstat 1 -u -e deepspeech --model models/deepspeech-0.9.3-models.pbmm --scorer models/deepspeech-0.9.3-models.scorer --audio audio/4507-16021-0012.wav"
	User time (seconds): 0.00
	System time (seconds): 0.00
	Percent of CPU this job got: 0%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.72
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2176
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 110
	Voluntary context switches: 3
	Involuntary context switches: 0
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Test 2: node deepSpeechTranscriptSpawn

Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 	03/05/2021 	_x86_64_	(8 CPU)

10:34:55      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
10:34:56     1000    156532    3,00    2,00    0,00    0,00    5,00     2  node
why should one halt on the way

Average:     1000    156532    3,00    2,00    0,00    0,00    5,00     -  node
	Command being timed: "pidstat 1 -u -e node deepSpeechTranscriptSpawn.js"
	User time (seconds): 0.00
	System time (seconds): 0.00
	Percent of CPU this job got: 0%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.70
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2228
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 111
	Voluntary context switches: 3
	Involuntary context switches: 1
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Test 3: node deepSpeechTranscriptNative

Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 	03/05/2021 	_x86_64_	(8 CPU)
node:internal/modules/cjs/loader:943
  throw err;
  ^

Error: Cannot find module '/home/giorgio/DeepSpeechJs/node_modules/deepspeech/lib/binding/v0.9.3/linux-x64/node-v93/deepspeech.node'
Require stack:
- /home/giorgio/DeepSpeechJs/node_modules/deepspeech/index.js
- /home/giorgio/DeepSpeechJs/deepSpeechTranscriptNative.js
    at Function.Module._resolveFilename (node:internal/modules/cjs/loader:940:15)
    at Function.Module._load (node:internal/modules/cjs/loader:773:27)
    at Module.require (node:internal/modules/cjs/loader:1012:19)
    at require (node:internal/modules/cjs/helpers:93:18)
    at Object.<anonymous> (/home/giorgio/DeepSpeechJs/node_modules/deepspeech/index.js:24:17)
    at Module._compile (node:internal/modules/cjs/loader:1108:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1137:10)
    at Module.load (node:internal/modules/cjs/loader:988:32)
    at Function.Module._load (node:internal/modules/cjs/loader:828:14)
    at Module.require (node:internal/modules/cjs/loader:1012:19) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/home/giorgio/DeepSpeechJs/node_modules/deepspeech/index.js',
    '/home/giorgio/DeepSpeechJs/deepSpeechTranscriptNative.js'
  ]
}
	Command being timed: "pidstat 1 -u -e node deepSpeechTranscriptNative.js"
	User time (seconds): 0.00
	System time (seconds): 0.00
	Percent of CPU this job got: 1%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.09
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2176
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 109
	Voluntary context switches: 2
	Involuntary context switches: 0
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@lissyx

Make several process and that’s it. But as I said, the library is already able to leverage several threads anyway.

Well, If I well understand, deepspeech API use threads under the hood but the nodeJs API functions block the main nodejs thread.

So I imagine two solutions (to manage a serve rthat has to manage multiple concurrent requests, where each request is cpu-bound task blocking the main thread:

Using nodejs “worker threads”
Forking multiple external (from the main nodejs thread) OS processes.

Ok! Now my doubt/question is about “DeepSpeech Model” RAM resources each process/thread uses. Running deepspeech cli after first run, seems using few MBs.

So if the Deepspeech Model has a reasonable size (Mbytes) it could be passed as worker_data to a worker thread (pure object TBD), and maybe spawning nodejs threads for each request is the way to go to maximize runtime performances. Do you agree?

Thanks
Giorgio

lissyx · May 3, 2021, 10:28am

@solyarisoftware I am not working on that project anymore, I don’t have time to help more.

solyarisoftware · May 3, 2021, 11:39am

No problem / nothing personal

Of course these open points are for a further open discussion with everyone interested.