Thanks all
@JGKK
Today I did some tests,
See simple script testPerformances.sh that roughly measures latencies/resources usage and cpu cores usage.
I’m perplexed about results:
When I run deepspeech_cli.sh command line program , I see > 200% cpu for 1.72 secs and 2.176 Mb of RAM consumption (If I well understand):
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.72
Maximum resident set size (kbytes): 2176
Notes:
-
1.72 seems to me a super fast latency, also because deepspeech cli program has to load in memory the model.
BTW I noticed that the “first time” the program is invoked it take > 21 seconds. For each call after, it’s really fast. I do not understand how thinks behave. It seems to me that the the model is loaded in memory as a shared library. It’s like that? If yes is a good news (see later).
-
If I run
/usr/bin/time --verbose pidstat 1 -u -e node deepSpeechTranscriptSpawn.js
where deepSpeechTranscriptSpawn.js is a nodejs program that just spawn a process that run previous deepspeech cli program, in this case pidstat measures just 4/5% CPU (instead of expected 200%). Here I do not understand.
-
Unfortunately just today, I was no more able to run NodeJs binding (using my simple wrapper deepSpeechTranscriptNative.js beacuse the program crashes. That’s weird because I ran succesfully the code one month ago. What changed is that i updated NodeJs to last version v16.0.0.
Ok that’s another story.
UPDATE:
I just opened the issue: https://github.com/mozilla/DeepSpeech/issues/3642
See below full stdout:
testPerformances.sh
test elapsed time / resopurce (using /urs/bin/time)
and CPU cores usage (using pidstat) in different cases
Test 1: deepspeech cli command
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 03/05/2021 _x86_64_ (8 CPU)
Loading model from file models/deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-05-03 10:34:49.272662: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loaded model in 0.0153s.
Loading scorer from files models/deepspeech-0.9.3-models.scorer
Loaded scorer in 0.000314s.
Running inference.
10:34:49 UID PID %usr %system %guest %wait %CPU CPU Command
10:34:50 1000 156502 159,00 54,00 0,00 0,00 213,00 2 deepspeech
why should one halt on the way
Inference took 1.454s for 2.735s audio file.
Average: 1000 156502 159,00 54,00 0,00 0,00 213,00 - deepspeech
Command being timed: "pidstat 1 -u -e deepspeech --model models/deepspeech-0.9.3-models.pbmm --scorer models/deepspeech-0.9.3-models.scorer --audio audio/4507-16021-0012.wav"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.72
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2176
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 110
Voluntary context switches: 3
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Test 2: node deepSpeechTranscriptSpawn
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 03/05/2021 _x86_64_ (8 CPU)
10:34:55 UID PID %usr %system %guest %wait %CPU CPU Command
10:34:56 1000 156532 3,00 2,00 0,00 0,00 5,00 2 node
why should one halt on the way
Average: 1000 156532 3,00 2,00 0,00 0,00 5,00 - node
Command being timed: "pidstat 1 -u -e node deepSpeechTranscriptSpawn.js"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.70
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2228
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 111
Voluntary context switches: 3
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Test 3: node deepSpeechTranscriptNative
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 03/05/2021 _x86_64_ (8 CPU)
node:internal/modules/cjs/loader:943
throw err;
^
Error: Cannot find module '/home/giorgio/DeepSpeechJs/node_modules/deepspeech/lib/binding/v0.9.3/linux-x64/node-v93/deepspeech.node'
Require stack:
- /home/giorgio/DeepSpeechJs/node_modules/deepspeech/index.js
- /home/giorgio/DeepSpeechJs/deepSpeechTranscriptNative.js
at Function.Module._resolveFilename (node:internal/modules/cjs/loader:940:15)
at Function.Module._load (node:internal/modules/cjs/loader:773:27)
at Module.require (node:internal/modules/cjs/loader:1012:19)
at require (node:internal/modules/cjs/helpers:93:18)
at Object.<anonymous> (/home/giorgio/DeepSpeechJs/node_modules/deepspeech/index.js:24:17)
at Module._compile (node:internal/modules/cjs/loader:1108:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1137:10)
at Module.load (node:internal/modules/cjs/loader:988:32)
at Function.Module._load (node:internal/modules/cjs/loader:828:14)
at Module.require (node:internal/modules/cjs/loader:1012:19) {
code: 'MODULE_NOT_FOUND',
requireStack: [
'/home/giorgio/DeepSpeechJs/node_modules/deepspeech/index.js',
'/home/giorgio/DeepSpeechJs/deepSpeechTranscriptNative.js'
]
}
Command being timed: "pidstat 1 -u -e node deepSpeechTranscriptNative.js"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 1%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.09
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2176
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 109
Voluntary context switches: 2
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
@lissyx
Make several process and that’s it. But as I said, the library is already able to leverage several threads anyway.
Well, If I well understand, deepspeech API use threads under the hood but the nodeJs API functions block the main nodejs thread.
So I imagine two solutions (to manage a serve rthat has to manage multiple concurrent requests, where each request is cpu-bound task blocking the main thread:
- Using nodejs “worker threads”
- Forking multiple external (from the main nodejs thread) OS processes.
Ok! Now my doubt/question is about “DeepSpeech Model” RAM resources each process/thread uses. Running deepspeech cli after first run, seems using few MBs.
So if the Deepspeech Model has a reasonable size (Mbytes) it could be passed as worker_data
to a worker thread (pure object TBD), and maybe spawning nodejs threads for each request is the way to go to maximize runtime performances. Do you agree?
Thanks
Giorgio