I think you can try training on a different dataset for one last time (Nancy, or English from MAILABS). If it works, that means it is your dataset causing the problem. If not, it is something within the hparams There really is no reason why it should not work. I have been able to get okay speech from a very hard dataset that had a lot of transcription mistakes and the pitch was very uneven.