DeepSpeech sox FAIL formats error when transcribing .wav

Oleksii_Davydenko · January 28, 2020, 3:51pm

I am trying to set up a transcription server in Node.JS, following this example https://github.com/mozilla/DeepSpeech-examples/blob/r0.6/nodejs_wav/index.js
My client takes audio from a MediaRecorder and sends it as an ArrayBuffer to the server through Socket.IO

sendAudioToTranscriptionServer() {
        let blob = new Blob(recordedChunks, { 'type' : 'audio/wav' });
        blob.arrayBuffer().then(ab => {
            ioClient.emit("transcribable-audio", ab);
        });
    }

The server then tries to transcribe the audio:

socket.on("transcribable-audio", (arrayBuffer) => {
	//let buffer = Buffer.from(new Uint8Array(arrayBuffer));
    	let buffer = new Uint8Array(arrayBuffer);
	let audioStream = new MemoryStream();
	try {
    	bufferToStream(buffer).
    	pipe(Sox({
	    global: {
	 	'no-dither': true,
	    },
	    output: {
	    	bits: 16,
	    	rate: desiredSampleRate,
	    	channels: 1,
	    	encoding: 'signed-integer',
	    	endian: 'little',
	    	compression: 0.0,
	    	type: 'raw'
	    }
    	})).
        pipe(audioStream);
	    console.log('pipe created');
    	audioStream.on('finish', () => {
	        let audioBuffer = audioStream.toBuffer();
	        const audioLength = (audioBuffer.length / 2) * (1 / desiredSampleRate);
	        console.log('audio length', audioLength);
	        let result = model.stt(audioBuffer.slice(0, audioBuffer.length / 2));
	        console.log('result:', result);
	        socket.emit("transcription-received", result);
    	});
    } catch (ex) {
	    console.error(ex.message);
	}
});

But I keep getting the “unhandled stream error in pipe”
Error: sox FAIL formats

What could be the reason for this? How can I get more information on the error?

lissyx · January 28, 2020, 3:52pm

I don’t see any input description in your sox call ?

lissyx · January 28, 2020, 3:54pm

Here is my code doing something similar:

        var audioStream = new MemoryStream();
        bufferToStream(rawAudio).
          pipe(Sox({
            input: {
              bits: 32,
              rate: 44100,
              channels: 1,
              encoding: 'floating-point',
              endian: 'little',
              type: 'raw',
            },
            output: {
              bits: 16,
              rate: sampleRate,
              channels: 1,
              encoding: 'signed-integer',
              endian: 'little',
              compression: 0.0,
              type: 'wavpcm',
            }
          })).
          pipe(audioStream);

Input type my vary, but your output seems also a bit strange, type: 'raw' for instance.

Oleksii_Davydenko · January 28, 2020, 4:05pm

Oh, sorry, I just found out I was recording video instead of audio in my MediaRecorder. So it couldn’t convert it to wav
I also found out that MediaRecorder doesn’t support audio/wav mime type, only audio/webm. Can DeepSpeech process it?

lissyx · January 28, 2020, 4:06pm

No, but you can still send audio/webm and have sox process it I guess? I use lower-level audio access and just get raw FP from the browser, so that’s not very much different …

Oleksii_Davydenko · January 29, 2020, 12:26pm

I still cannot get rid of this error (sox FAIL formats) no matter what I do
I transformed the recorded blobs from the MediaRecorder to wav and sent them to the server
I added an input description to my sox call

bufferToStream(buffer).
    	pipe(Sox({
	    global: {
	 	'no-dither': true,
	    },
	    input: {
	        bits: 32,
		rate: 44100,
		channels: 1,
		encoding: 'floating-point',
		endian: 'little',
		type: 'wav',
	    },
	    output: {
	    	bits: 16,
	    	rate: desiredSampleRate,
	    	channels: 1,
	    	encoding: 'signed-integer',
	    	endian: 'little',
	    	compression: 0.0,
	    	type: 'wavpcm'
	    }
    	})).
    	pipe(audioStream);

How can I check what kind of formatting discrepancy I have?

lissyx · January 29, 2020, 4:25pm

Without more informations from sox it’s hard …

Should that be raw instead of wav ?

Have you tried dumping the audio feed before passing it to sox and try to call manually from command line ?

Are you sure sox is running properly ?

dsteinman · January 29, 2020, 5:51pm

Did you try the web microphone example? It does transcription through socket.io/nodejs probably in the way you want.

You actually don’t need to use sox at all, it’s better to downsample the audio in the client before sending it through socket.io because less processing is on the server and then sox isn’t needed.

lissyx · January 29, 2020, 6:10pm

Could you elaborate on your example here? I can’t find exactly in your code. Here, we get raw audio out of the browser, it’s not just down-sampling, it’s actual conversion from float 32 bits to wave pcm signed 16 bits

dsteinman · January 29, 2020, 6:21pm

In the web microphone example, a web worker is used to downsample from float 32 (recorded in the browser) to pcm 16 bit:

This is the code for the web worker, i didn’t write actually, it’s from another speech / wake word project called Porcupine:

https://github.com/mozilla/DeepSpeech-examples/blob/r0.6/web_microphone_websocket/public/downsampling_worker.js

The web worker is used by this code:

https://github.com/mozilla/DeepSpeech-examples/blob/r0.6/web_microphone_websocket/src/App.js

createAudioProcessor(audioContext, audioSource) {
	let processor = audioContext.createScriptProcessor(4096, 1, 1);
	
	const sampleRate = audioSource.context.sampleRate;
	
	let downsampler = new Worker(DOWNSAMPLING_WORKER);
	downsampler.postMessage({command: "init", inputSampleRate: sampleRate});
	downsampler.onmessage = (e) => {
		if (this.socket.connected) {
			this.socket.emit('stream-data', e.data.buffer);
		}
	};
	
	processor.onaudioprocess = (event) => {
		var data = event.inputBuffer.getChannelData(0);
		downsampler.postMessage({command: "process", inputFrame: data});
	};
	
	processor.shutdown = () => {
		processor.disconnect();
		this.onaudioprocess = null;
	};
	
	processor.connect(audioContext.destination);
	
	return processor;
}

In particular, this is the code that receives the 16bit integer data and sends it to the socket.io server:

downsampler.onmessage = (e) => {
		if (this.socket.connected) {
			this.socket.emit('stream-data', e.data.buffer);
		}
	};

In the server code that data needs not be further processed - sox isn’t required because it’s already in the right format.

lissyx · January 29, 2020, 6:23pm

Right, yes, that’s one other solution, but it requires a bit more work than just leaving the work to sox. Though, you say it’s downsampling when you are actually performing conversion.

Nothing complicated in itself, but when you try to keep examples simples, it’s nice to avoid it.

Maybe @Oleksii_Davydenko should try it to eliminate the risk that sox is broken somehow.

dsteinman · January 29, 2020, 6:37pm

I did write a JavaScript microphone / hotword / downsampling library which does all this conversion/downsampling pretty cleanly. BumbleBee includes that downsampler library along with Porcupine.

https://github.com/jaxcore/bumblebee-hotword

const BumbleBee = require('bumblebee-hotword');

let bumblebee = new BumbleBee();
bumblebee.setWorkersPath('/bumblebee-workers');
bumblebee.addHotword('bumblebee');

bumblebee.on('data', function(data) {
	// DATA TO SEND TO DEEPSPEECH
});

bumblebee.start();

beiserjohannes · January 30, 2020, 2:08pm

I’m still not sure why this is caused but I struggled with this a lot aswell and ended up using ffmpeg to transcode the audio server side.

(The problem might be caused by the fact that you create a BLOB with a audio/wav header when it in fact is audio/webm since the MediaRecording API can’t create wav - and sox if I’m not mistaken can’t handle audio/webm, not too sure about that though)

However the solution introduced by Dan:

… is another good Idea since this also reduces network traffic with a smaller Sample-rate.

I actually use both solutions:

Downsampling in the frontend for finished recordings (record before sending to server)
Downsampling server-side for continuously streaming and transcribing the audio in real time

If you wanna play around I’ve created yet another example-app which lets you experiment with both + choose different language models (WIP!)

Gitlab

lissyx · January 30, 2020, 2:12pm

I don’t want to diverge too much, but that’s a nice exampled. Though, I see you are still on 0.6.0, you should move onto 0.6.1 that has nice bugfixes (the model did not changed, only lib + exported tflite model).

Topic		Replies	Views
Issue in format conversion when recording a wav file DeepSpeech	2	379	July 8, 2019
Microphone stream w/ nodejs DeepSpeech	11	5270	October 1, 2018
Text-To-Speech in Node js DeepSpeech	2	1099	May 2, 2021
DeepSpeech Problems with Speech Recognition Using Microphone DeepSpeech issue	12	2171	February 3, 2021
Error handling in nodejs native client DeepSpeech	5	1066	January 25, 2019

DeepSpeech sox FAIL formats error when transcribing .wav

Related topics