I’m trying to train a custom voice, with 24 hours of an audio book recording and associated transcription in LJSpeech format for the dataloader (I’ve posted a sample on my github hosting/mozilla-tts/custom-dataset-sample at master · dubreuia/hosting · GitHub). I’ve used the notebooks AnalyzeDataset and CheckSpectrograms to check my data first, it looks good to me.
After 60K iter (see screenshots), I have nothing (I’ve trained to 100K to no avail), the audio is blank, or faintly humming. I’ve trained on LJSheech, I have good results at 100K as it should.
I’m thinking maybe the sample rate is wrong (because I split the audio from an mp3 file, then converted it to wav), but I’ve used both notebooks to tune my config so I’m pretty sure it is good.