My Tacotron test .wav is very long!

Hello,
I have just completed my first tiny tiny 300 training steps. My output was 2.5 minutes of bleeps and bloops! After the initial relief that at least the audio wasn’t silent, I was wandering “is it normal that a sentence ‘He is your’ results in such long audio?” Even for a beginning network.
If you have any thoughts, let me know!

Sounds like it’s a few thousand steps away from alignment.

Try training for another ~10.000 steps and you should see the first results.

Thank you for your quick reply, this gives me something to aim for :smiley:

Thank you, this also helps with my terminology!

300 steps is one epoch (out of 1000). Sounds like you have been waiting for a long time. Are you somehow training on a CPU or?

Hey there, I believe I was looking at the global steps at the time, not epochs. I am currently altering my batch size to make those pass faster.
I am running a GTX 1060 btw