Attention weights of Graves Attention get decreased and reach zero value as iteration goes

HI, I added GravesAttention class to our custom Tacotron-gst code, and when we train it, the attention weight value (alpha_t) gets decreased over iterations and eventually it becomes complete zero vector. Can anyone help with this issue? or has anyone experienced this issue?

can you post tensorboard figures?

It is not all the attention goes to zero, only the first step value gets too big therefore on the figure other steps visually dominated by it.

It looks like this after 100 steps.
image

And It used to look like this using Location attention instead of Graves attention.
image (5)

This is just the step 100. Wait more.

okay. I will let you know tomorrow after training 1day. thanks for quick reply

Hi, the training automatically stopped at iteration 1300, as attention weights became nan value. At iteration 1300, the attention still looked like this.
image
Maybe I implemented attention in wrong way.
Just want to make sure one thing: Do I need to normalize alpha_t (attention weights) ?

Did you implement your own? Because we already have it on master now .

yes, I am already using that and trying to add it to our own custom TTS.

I updated the graves attention. It looks much more reliable in my experiments.

you mean ‘dev’ branch of mozilla/TTS repo?

Have you tried GMM attention with multispeaker TTS (Especially tacotron-gst) ? Does attention alignment work well?

I’ve not tried with multi speaker

@byuns9334 Did you manage to make the GravesAttention work? I’m getting the same empty alignment plots after 80k steps