Hello, just to share the data that I’m currently using, it contains 50h of reviewed speech and 50h of aligned speech but not reviewed. To review I’m currently using a DeepSpeech model, where the transcription matches the DS prediction I mark it as valid.
Most of the time we have a limited amount data to train (in this case for Spanish), the idea of this dataset is to use it as base and try to adapt it for a new a voice with way less data.
Using LJSpeech format!
Enjoy and please share any feedback.