Firstly, thank you for the great TTS implementation. I was able to start training on my dataset without having any problems.
Straight to the point, I’m using a private Portuguese dataset formatted like LJSpeech and my results on tensorboard are anything but healthy:
A bit about config.json (only the parameters I changed from master branch):
- model: Tacotron2
- sample_rate: 44100
- mel_fmin: 95.0
- characters: “AÁÂÃÀBCÇDEÉÊFGHIÍJKLMNOÓÔÕPQRSTUÚVWXYZaáâãàbcçdeéêfghiíjklmnoóôpqrstuúvwxyz!’(),-.:;?” "
- phonemes: “ãẽĩõiyɨʉɯuɪʏʊeøɘəɵɤoɛœɜɞʌɔæɐaɶɑɒᵻʘɓǀɗǃʄǂɠǁʛpbtdʈɖcɟkɡqɢʔɴŋɲɳnɱmʙrʀⱱɾɽɸβfvθðszʃʒʂʐçʝxɣχʁħʕhɦɬɮʋɹɻjɰlɭʎʟˈˌːˑʍwɥʜʢʡɕʑɺɧɚ˞ɫ”
- gradual_training: [[0, 7, 32], [1, 5, 32], [50000, 3, 32], [130000, 2, 16], [290000, 1, 8]]
- phoneme_language: pt-br
I’m binge reading posts from discourse and I saw somewhere that, with gradual training, the alignments graph should start to look like a diagonal anywhere near to 10k steps. Any ideas about what could be wrong? (Or maybe I just need more training).