Here is example of dataset clips
https://drive.google.com/open?id=1qlKYT6W4izLPzhw5UTFDZY1ftOxIsbOJ
I have applied rnnoise to the dataset for the german dataset spoken by @mrthorstenm
In addition I apply filtering with sox: highpass at 100Hz and lowpass at 7000
Listening to @snake examples I think it wouldn’t hurt to apply rnnoise there.
The optimization made by @dkreutz to my recordings really improved the quality precisely. Lots of my recordings had random noise or reverb problems before.
How can I use rnnoise? It seems rnnoise_demo common_voice_ru_18849003.wav out.wav
doesn’t work for me.
>ffmpeg -i ./common_voice_ru_18849003.wav
Input #0, wav, from './common_voice_ru_18849003.wav':
Metadata:
encoder : Lavf57.83.100
Duration: 00:00:02.81, bitrate: 353 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
> ffmpeg -i out.wav
out.wav: Invalid data found when processing input
rnnoise_demo
requires raw audio, not wav. I have created a python-script for rnnoise&sox filtering for a preprocessing toolchain (not the nicest programming, but does its job):
from pathlib import Path
import subprocess
src = "/path/to/your/wav/files/"
rnn = "/path/to/rnnoise_demo"
paths = Path(src).glob("*.wav")
i = 0
for filepath in paths:
i += 1
print(str(filepath))
filename = str(filepath).split("/")[-1]
#print(filename)
subprocess.run(["sox", filepath, "48k.wav", "remix", "-", "rate", "48000"]) #stereo to mono and upsample to 48000Hz
subprocess.run(["sox", "48k.wav", "-c", "1", "-r", "48000", "-b", "16", "-e", "signed-integer", "-t", "raw", "temp.raw"]) # convert wav to raw
subprocess.run([rnn, "temp.raw", "rnn.raw"]) # apply rnnoise
subprocess.run(["sox", "-r", "48k", "-b", "16", "-e", "signed-integer", "rnn.raw", "-t", "wav", "rnn.wav"]) # convert raw back to wav
subprocess.run(["rm", "-f", str(filepath)]) # overwrite file at destination
subprocess.run(["sox", "rnn.wav", str(filepath), "remix", "-", "highpass", "100", "lowpass", "7000", "rate", "22050"]) # apply high/low pass filter and change sr to 22050Hz
#if i == 10: # for testing stop after ten files
# break
print("total: ", i)
I use pydub and https://github.com/Shb742/rnnoise_python
for doing rnnoise in bulk/scripts, but your way also translates easily to bash.
Thanks man. Your script is great. It improved alignment a bit but still there is no diagonal line. I just wonder why there is diagonal for val and train, but no for test? Isn’t diagonal for validation is proof of good model?