Need help with audio cleaning/de-noising

lazyguy · May 27, 2020, 9:28pm

Hello everyone,

I need help/ideas about what I can use for de-noising audios so I can improve the model accuracy. I have already tried RNNoise, model output for some audios gets better and for some it gets worse(which is not helping). So, can you guys suggest anything else that I can do? It’ll be really helpful.

Following are the details I’m working with:
Acoustic Model : 0.7.1 released by Mozilla
Scorer : Custom
Data : Conversational data. Customer support.

I am splitting the audio call into smaller chunks using VAD and those audio chunks are then fed to the model. What I’ve observed is that the model does good when the audio duration is not very high. So I am hoping that if I can de-noise, that will probably help me more with longer audio files but I am not sure where to start. Thanks!

othiele · May 27, 2020, 9:59pm

Bad data will lead to bad results, not much you can do about it.

But are you using the standard English release to recognize phone calls? That’s usually not too good.

dwn · May 28, 2020, 6:41am

Just an idea but you could denoise all your training data, as sort of data augmentation.
Then you retrain with both, original and denoised data.
You should do the same with all new incoming data.
This trains your model to be robust to both, original and cleaned data.

If you have already enough labelled noisy data, you could also try just training with the noisy data without any denoising. The network should figure it out alone; but it’s always a matter of the amount of training data.

cheers,
d

lazyguy · May 28, 2020, 6:52am

Hello, yes I am using standard English release. What I am trying to figure out is if de-noising of a longer chunk can help me with longer audio inputs. But what I am seeing is, denoising does correct some words but it also causes a few(originally correct words) to be wrongly predicted. So overall gain in accuracy gets diminished.

lazyguy · May 28, 2020, 7:07am

Thanks. That looks like something I can try. Also, what I am thinking of is getting the longer audio chunks(that aren’t doing well) and train the model released by Mozilla using those.

othiele · May 28, 2020, 7:39am

How long are your longer chunks on average?

baconator · May 28, 2020, 7:39am

For inference, I do filtering on my input data. I experimented with denoising (RNNoise) and found it to be slow and not provide much improvement. It does improve, but normalizing led to the best improvement in accuracy. After that, high-pass and low-pass filters are nearly cost-free and showed very slight improvements as well. The order of filtering also has an effect.

lazyguy · May 28, 2020, 8:00am

Hi, >10 sec. And the results I get are not totally wrong. It’s like:

The words enclosed in [ ] are predicted wrong, I have mentioned the correct word along side the wrong word.

[digit]try pushing any buttons on the side and see if for any light comes up on the [keep or]keypad if you hear a beep [on]tone out of it. yes there is a [bed]beep. i have light flashing.

There are other cases as well, where the words don’t make sense. But those are mostly longer audios. In shorter audios, generally it makes good sense when you read the prediction.

lazyguy · May 28, 2020, 8:02am

Hi, thanks for sharing. Can you share your code snippet, if possible?

mbonsign · May 28, 2020, 3:19pm

I was surprised by how much better my WER got after I tuned my microphone. Make sure it isn’t getting distorted by being set too high. I’m on UBUNTU, and I played around with PulseEffects web RTC settings. Those can really help.

baconator · May 28, 2020, 4:04pm

See https://github.com/el-tocino/localcroft/tree/master/DeepSpeech/DStest.py
(uses https://github.com/Shb742/rnnoise_python and https://github.com/xiph/rnnoise)
I ended up running through maybe 1/4 of the total possible combinations to see what worked best for me.

lazyguy · May 29, 2020, 8:43pm

Thanks! Using recorded audios right now but will keep in mind when I switch to streaming.

lazyguy · May 29, 2020, 8:44pm

I’ll check this out and will also post here, the approach that helped me the most.

lazyguy · February 17, 2021, 7:45am

Hello All. For users, who are working on audio denoising/noise reduction, please check this out. This denoiser released by facebook.

github.com

GitHub - facebookresearch/denoiser: Real Time Speech Enhancement in the Waveform...

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

After the environment is set up, this is code that I used:

python -m denoiser.enhance --dns64 --noisy_dir=<path to the dir with the noisy files> --out_dir=<path to store enhanced files>

I tried it out and the noise reduction is more prominent than RNNoise, at least it was on the data I tested it on. Hope it helps!

dkreutz · February 17, 2021, 6:01pm

Yes, fbr-denoise is very good, but is also demanding high cpu resources compared to RNNoise.

Topic		Replies	Views
Add support for Real-time Noise cancellation in all DeepSpeech Inference Examples (Feature Request) DeepSpeech	4	623	April 2, 2021
Noticeably better accuracy with DeepSpeech 0.7 DeepSpeech	0	602	April 25, 2020
Very low accuracy with 0.6.1 model - would you sanity check? DeepSpeech	7	960	February 6, 2020
Availability of pre-trained models DeepSpeech	22	1787	November 12, 2019
Longer audio files with Deep Speech DeepSpeech	12	12037	November 21, 2019

Need help with audio cleaning/de-noising

Related topics