I am noticing that at 70K steps the length of speech I can synthesize is shorter than if I do not train using DDC. If not using it, I can synthesize one whole page off a book and with it, at 70K I can barely synthesize 2 sentences. Is this how it should be? Is it slower to get the attention in place? It collapses after the third sentence. I am using original prenet because bn makes my speaker sound like they are dying; however, my dataset is very good (but can only fit sentences up to 200 characters because of DDC).