sometimes audio files are too big (more than 30 sec for a audio which is supposed to be 8-10 sec). post the actual content (say after 8-10 sec) random noisy is being induced for the remainder of 15-20 sec. what could be the possible reason for this issue?
I’m guessing you may also see a message about
max_decoder_steps.
You can think of the model being unable to correctly output audio and it doesn’t know when it is done, so it carries on until it reaches this maximum at which point it is stopped.
This can happen for various reasons but first would be good to know about your setup.
Do you have a decent amount of audio?
What is the audio dataset?
Is the audio suitable for training TTS?
Are the transcriptions good quality/accurate?
If you’re getting those max_decoder_steps messages, what stage do they appear at.
If you are just casually interested, that’s a good place to start. If want to go further, it can help to share more about your specific settings and install. The gatherup tool can help you format that kind of info nicely.
audio dataset - LJ speech
yes i am getting this max_decoder_steps message during inference…
Decoder stopped with 'max_decoder_steps
(256000,)
i get the above message during audio synthesis