My Tacotron test .wav is very long!

JDB · July 24, 2020, 10:21pm

Hello,
I have just completed my first tiny tiny 300 training steps. My output was 2.5 minutes of bleeps and bloops! After the initial relief that at least the audio wasn’t silent, I was wandering “is it normal that a sentence ‘He is your’ results in such long audio?” Even for a beginning network.
If you have any thoughts, let me know!

baconator · July 25, 2020, 6:53am

Sounds like it’s a few thousand steps away from alignment.

sanjaesc · July 25, 2020, 5:26am

Try training for another ~10.000 steps and you should see the first results.

JDB · July 25, 2020, 6:25am

Thank you for your quick reply, this gives me something to aim for

JDB · July 25, 2020, 6:32am

Thank you, this also helps with my terminology!

georroussos · July 25, 2020, 9:14am

300 steps is one epoch (out of 1000). Sounds like you have been waiting for a long time. Are you somehow training on a CPU or?

JDB · July 25, 2020, 9:16am

Hey there, I believe I was looking at the global steps at the time, not epochs. I am currently altering my batch size to make those pass faster.
I am running a GTX 1060 btw