LJspeech dataset audio improvements

So was looking at the two main issues with ljspeech dataset, noise and sibilants. I think they can be fixed, so I tested out some some batch processing on the first 7 files. I think the original is pretty reverb-y as well, so toned that down, maybe could go further. From a quick skim it seems like it’s recorded in the same room, mostly uniform so I don’t think processing will be destructive but haven’t dived too deep. Any thoughts on this?

with processing:


Interesting. It’s subtle (at least listening on my device) but it does sound like you removed the room reverb that’s noticeable in a few places. How did you do it and can it be applied en masse, eg via script?

1 Like

That is great! Maybe you can also try a lowpass and silence shortening.

I’ve been producing a few podcasts so set up a nice chain/workflow for dialogue. It can be applied en masse, It’s with audio plug-ins though. I’m using RX7 for batch processing function but I use 3rd party plug-ins from my production workflow within it.

1 Like

good idea on the low pass! I’m using audio plug-ins vs. a script so I’m not sure if there’s a reliable plug-in for trimming silence in this workflow.
The treatment is applied fairly lightly here as I wasn’t sure about the variance in acoustics over the set. I will analyze the whole dataset for changes, then could batch treat sections. Any other issues you can think of to address?

Not really, but I suggested the lowpass because she recorded in a typical room so it should help. Silence shortening can be done with Wavepad :smiley:

great, I’ll treat them first then use wavepad when there’s no noise