Hey guys, check my newly released app, open API (mind the speed for a free service) using the deepspeech pretrained model. Voice Activity Detection implemented as well as client side audio resampling from mic to wav 16kHZ mono 16 bit.
Looks cool, how did you implement the VAD in JavaScript?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
3
@im_alex That does look nice, and behaves as expected in my case with my english accent. There’s some latency, I see you are running cpu implem with pbmm model, maybe switching to tflite could help there?
for the VAD I used the web audio API, an audio context and the analyzer method, something like this:
//new audiocontext
let audioCtx = new (window.AudioContext || window.webkitAudioContext)();
//analyzer for the audio context
let analyser =audioCtx.createAnalyser();
//request microphone access
navigator.mediaDevices.getUserMedia({ audio: true, video: false })
.then(async (stream) => {
//analize sound waves
//stream source
let source = audioCtx.createMediaStreamSource(stream);
//analyser now can read data from the source
source.connect(analyser);
//set the fftSize
analyser.fftSize = 2048;
//get the buffer length from the analyser
let bufferLength = analyser.frequencyBinCount;
//create a uint8 array
let dataArray = new Uint8Array(bufferLength)
//call this to get the current frequency and put it into dataArray
analyser.getByteFrequencyData(dataArray)
//now handle the the dataArray, which has frequencies from 0-255 (0 ===total silence)
//if all elements === 0 then no voice
// call analyser.getByteFrequencyData(dataArray) as often as you want to analyze voice frequency
// I used 5ms in my app
handleMicSilence(dataArray);
});