So was looking at the two main issues with ljspeech dataset, noise and sibilants. I think they can be fixed, so I tested out some some batch processing on the first 7 files. I think the original is pretty reverb-y as well, so toned that down, maybe could go further. From a quick skim it seems like it’s recorded in the same room, mostly uniform so I don’t think processing will be destructive but haven’t dived too deep. Any thoughts on this?
Interesting. It’s subtle (at least listening on my device) but it does sound like you removed the room reverb that’s noticeable in a few places. How did you do it and can it be applied en masse, eg via script?
I’ve been producing a few podcasts so set up a nice chain/workflow for dialogue. It can be applied en masse, It’s with audio plug-ins though. I’m using RX7 for batch processing function but I use 3rd party plug-ins from my production workflow within it.
good idea on the low pass! I’m using audio plug-ins vs. a script so I’m not sure if there’s a reliable plug-in for trimming silence in this workflow.
The treatment is applied fairly lightly here as I wasn’t sure about the variance in acoustics over the set. I will analyze the whole dataset for changes, then could batch treat sections. Any other issues you can think of to address?