Oh yea sure:
Get Audio Stream from the client and send send it over websocket:
this.ws = new WebSocket("ws://localhost:3000");
let stream = await navigator.mediaDevices.getUserMedia({audio: true});
let rec = new MediaRecorder(stream);
//emits ondataavailable event every 1000ms
rec.start(1000);
rec.ondataavailable = (e)=>{
//Only new data since last ondataavailable event is contained here
this.ws.send(e.data);
}
Server side receive audio chunk via websocket and push into stream
let stream = new Duplex({
read(size){
//push implemented elsewhere (websocket)
},
write(){}
});
const ffmpeg = spawn('ffmpeg', [
'-hide_banner',
'-nostats',
'-i', '-',
'-vn',
'-acodec', 'pcm_s16le',
'-ac', 1,
'-ar', AUDIO_SAMPLE_RATE, // 16000
'-f', 's16le',
'pipe:1'
]);
stream.pipe(ffmpeg.stdin);
websocket.on('message', (data) => {
stream.push(data);
});
All that happens from there on is equivalent to the Deepspeech example. ffmpeg.stdout is piped to VAD and processed by the model