Oh, sorry, I just found out I was recording video instead of audio in my MediaRecorder. So it couldn’t convert it to wav
I also found out that MediaRecorder doesn’t support audio/wav mime type, only audio/webm. Can DeepSpeech process it?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
No, but you can still send audio/webm and have sox process it I guess? I use lower-level audio access and just get raw FP from the browser, so that’s not very much different …
I still cannot get rid of this error (sox FAIL formats) no matter what I do
I transformed the recorded blobs from the MediaRecorder to wav and sent them to the server
I added an input description to my sox call
Did you try the web microphone example? It does transcription through socket.io/nodejs probably in the way you want.
You actually don’t need to use sox at all, it’s better to downsample the audio in the client before sending it through socket.io because less processing is on the server and then sox isn’t needed.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
Could you elaborate on your example here? I can’t find exactly in your code. Here, we get raw audio out of the browser, it’s not just down-sampling, it’s actual conversion from float 32 bits to wave pcm signed 16 bits
In the server code that data needs not be further processed - sox isn’t required because it’s already in the right format.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
11
Right, yes, that’s one other solution, but it requires a bit more work than just leaving the work to sox. Though, you say it’s downsampling when you are actually performing conversion.
Nothing complicated in itself, but when you try to keep examples simples, it’s nice to avoid it.
Maybe @Oleksii_Davydenko should try it to eliminate the risk that sox is broken somehow.
I did write a JavaScript microphone / hotword / downsampling library which does all this conversion/downsampling pretty cleanly. BumbleBee includes that downsampler library along with Porcupine.
const BumbleBee = require('bumblebee-hotword');
let bumblebee = new BumbleBee();
bumblebee.setWorkersPath('/bumblebee-workers');
bumblebee.addHotword('bumblebee');
bumblebee.on('data', function(data) {
// DATA TO SEND TO DEEPSPEECH
});
bumblebee.start();
I’m still not sure why this is caused but I struggled with this a lot aswell and ended up using ffmpeg to transcode the audio server side.
(The problem might be caused by the fact that you create a BLOB with a audio/wav header when it in fact is audio/webm since the MediaRecording API can’t create wav - and sox if I’m not mistaken can’t handle audio/webm, not too sure about that though)
However the solution introduced by Dan:
… is another good Idea since this also reduces network traffic with a smaller Sample-rate.
I actually use both solutions:
Downsampling in the frontend for finished recordings (record before sending to server)
Downsampling server-side for continuously streaming and transcribing the audio in real time
If you wanna play around I’ve created yet another example-app which lets you experiment with both + choose different language models (WIP!)
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
14
I don’t want to diverge too much, but that’s a nice exampled. Though, I see you are still on 0.6.0, you should move onto 0.6.1 that has nice bugfixes (the model did not changed, only lib + exported tflite model).