TestFigures doesn't align

Here is example of dataset clips
https://drive.google.com/open?id=1qlKYT6W4izLPzhw5UTFDZY1ftOxIsbOJ

I have applied rnnoise to the dataset for the german dataset spoken by @mrthorstenm
In addition I apply filtering with sox: highpass at 100Hz and lowpass at 7000

Listening to @snake examples I think it wouldn’t hurt to apply rnnoise there.

2 Likes

The optimization made by @dkreutz to my recordings really improved the quality precisely. Lots of my recordings had random noise or reverb problems before.

How can I use rnnoise? It seems rnnoise_demo common_voice_ru_18849003.wav out.wav doesn’t work for me.

>ffmpeg -i ./common_voice_ru_18849003.wav
Input #0, wav, from './common_voice_ru_18849003.wav':
  Metadata:
    encoder         : Lavf57.83.100
  Duration: 00:00:02.81, bitrate: 353 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s

> ffmpeg -i out.wav
out.wav: Invalid data found when processing input

rnnoise_demo requires raw audio, not wav. I have created a python-script for rnnoise&sox filtering for a preprocessing toolchain (not the nicest programming, but does its job):

from pathlib import Path
import subprocess

src = "/path/to/your/wav/files/"

rnn = "/path/to/rnnoise_demo"

paths = Path(src).glob("*.wav")

i = 0

for filepath in paths:
    i += 1
    print(str(filepath))
    filename = str(filepath).split("/")[-1]
    #print(filename)
    subprocess.run(["sox", filepath, "48k.wav", "remix", "-", "rate", "48000"]) #stereo to mono and upsample to 48000Hz
    subprocess.run(["sox", "48k.wav", "-c", "1", "-r", "48000", "-b", "16", "-e", "signed-integer", "-t", "raw", "temp.raw"]) # convert wav to raw
    subprocess.run([rnn, "temp.raw", "rnn.raw"]) # apply rnnoise
    subprocess.run(["sox", "-r", "48k", "-b", "16", "-e", "signed-integer", "rnn.raw", "-t", "wav", "rnn.wav"]) # convert raw back to wav 
    subprocess.run(["rm", "-f", str(filepath)]) # overwrite file at destination
    subprocess.run(["sox", "rnn.wav", str(filepath), "remix", "-", "highpass", "100", "lowpass", "7000", "rate", "22050"]) # apply high/low pass filter and change sr to 22050Hz
    #if i == 10: # for testing stop after ten files
    #    break

print("total: ", i)
2 Likes

I use pydub and https://github.com/Shb742/rnnoise_python
for doing rnnoise in bulk/scripts, but your way also translates easily to bash.

1 Like

Thanks man. Your script is great. It improved alignment a bit but still there is no diagonal line. I just wonder why there is diagonal for val and train, but no for test? Isn’t diagonal for validation is proof of good model?