I am testing the approach with Tacotron2 + Multiband Melgan in the dev branch (commit 8c07ae7).
I was wondering if anyone has an example of what a healthy loss function looks like when training the vocoder.
In my case, (with a smallish custom dataset) I observed some weird behavior the moment we introduce the discriminator (
Before this point the generator losses were trending down (as expected). After this point they somewhat blew up.
Is this expected? Just a matter of training for longer? Maybe setting a larger