So was looking at the two main issues with ljspeech dataset, noise and sibilants. I think they can be fixed, so I tested out some some batch processing on the first 7 files. I think the original is pretty reverb-y as well, so toned that down, maybe could go further. From a quick skim it seems like it’s recorded in the same room, mostly uniform so I don’t think processing will be destructive but haven’t dived too deep. Any thoughts on this?
with processing:
https://tinyurl.com/yywus4nq
original:
https://tinyurl.com/yxn5mpne