I have trained a Melgan vocoder using my own data, but when its used for end-to-end TTS, some of the synthesized results (about 3% utterances)has some artifacts (noise). In details, the mel-spectrum in corresponding ares discontinuous, shown as follows:
Any suggestions to improve the this?