I have found it here
What is the expected tensor size?
I have found it here
What is the expected tensor size?
I’ve used it before but it is not active in the current code base. If you like you need to run it over the attention alignment vector.
I want to make alignments for fastspeech train.
So every text token should consist with mel spectrogram frame.
My alignment looks like this with multi head attention.
It tend to skip whitespaces and other punctuation.
Default tacotron is not perfect too.
So I want to try this function to make the strict token-to-token alignment.
Is it fine for [batch, width, height]?
Or I need to run it for every sample like [loss(i) for I in batch]?
Maybe I need something else?
Thanks for attention.
entropy loss would not do that. It works more like a guided attention loss which forces the alignment to be more diagonal.
Alignment looks quite noisy. Pls share the config.json so I can guess better.