Attention weights of Graves Attention get decreased and reach zero value as iteration goes

byuns9334 · January 16, 2020, 9:58am

HI, I added GravesAttention class to our custom Tacotron-gst code, and when we train it, the attention weight value (alpha_t) gets decreased over iterations and eventually it becomes complete zero vector. Can anyone help with this issue? or has anyone experienced this issue?

erogol · January 16, 2020, 10:05am

can you post tensorboard figures?

It is not all the attention goes to zero, only the first step value gets too big therefore on the figure other steps visually dominated by it.

byuns9334 · January 16, 2020, 10:16am

It looks like this after 100 steps.

And It used to look like this using Location attention instead of Graves attention.
image (5)

erogol · January 16, 2020, 10:18am

This is just the step 100. Wait more.

byuns9334 · January 16, 2020, 10:20am

okay. I will let you know tomorrow after training 1day. thanks for quick reply

byuns9334 · January 17, 2020, 1:14am

Hi, the training automatically stopped at iteration 1300, as attention weights became nan value. At iteration 1300, the attention still looked like this.

Maybe I implemented attention in wrong way.
Just want to make sure one thing: Do I need to normalize alpha_t (attention weights) ?

erogol · January 17, 2020, 8:28am

Did you implement your own? Because we already have it on master now .

byuns9334 · January 17, 2020, 9:09am

yes, I am already using that and trying to add it to our own custom TTS.

erogol · January 27, 2020, 3:56pm

I updated the graves attention. It looks much more reliable in my experiments.

byuns9334 · January 27, 2020, 4:26pm

you mean ‘dev’ branch of mozilla/TTS repo?

byuns9334 · January 28, 2020, 5:33am

Have you tried GMM attention with multispeaker TTS (Especially tacotron-gst) ? Does attention alignment work well?

erogol · January 28, 2020, 7:32am

I’ve not tried with multi speaker

brunnen · March 16, 2021, 4:36pm

@byuns9334 Did you manage to make the GravesAttention work? I’m getting the same empty alignment plots after 80k steps