Deepspeech Live RESTful API - using React and Node

Hey guys, check my newly released app, open API (mind the speed for a free service) using the deepspeech pretrained model. Voice Activity Detection implemented as well as client side audio resampling from mic to wav 16kHZ mono 16 bit.

live demo

Check and star the Github Repo

Author:
Alex Lizarraga
Portfolio

3 Likes

Looks cool, how did you implement the VAD in JavaScript?

@im_alex That does look nice, and behaves as expected in my case with my english accent. There’s some latency, I see you are running cpu implem with pbmm model, maybe switching to tflite could help there?

for the VAD I used the web audio API, an audio context and the analyzer method, something like this:

//new audiocontext
let audioCtx = new (window.AudioContext || window.webkitAudioContext)();
//analyzer for the audio context
let analyser =audioCtx.createAnalyser();

//request microphone access
navigator.mediaDevices.getUserMedia({ audio: true, video: false })
.then(async (stream) => {
    //analize sound waves
    //stream source
    let source = audioCtx.createMediaStreamSource(stream);
    //analyser now can read data from the source
    source.connect(analyser);

    //set the fftSize
    analyser.fftSize = 2048;

    //get the buffer length from the analyser
    let bufferLength = analyser.frequencyBinCount;

    //create a uint8 array
    let dataArray = new Uint8Array(bufferLength)
    //call this to get the current frequency  and put it     into dataArray
    analyser.getByteFrequencyData(dataArray)

    //now handle the the dataArray, which has frequencies from 0-255 (0 ===total silence)
    //if all elements === 0 then no voice
    // call analyser.getByteFrequencyData(dataArray) as often as you want to analyze voice frequency
    // I used 5ms in my app
    handleMicSilence(dataArray);
});

Latency is definitely a problem, will test with tflite and GPU implementations as well, thanks for the feedback

Interesting implementation, thanks for sharing.