Text-to-speech output, words are getting duplicated

Hi there team,

Thank you for the great work so far!
I wanted to consult on an issue that sometimes occurs.

Text:
Good afternoon sir.

Output to WAV:
Good afternoon sir sir.

It duplicates the last word. This doesn’t happen always, but we’ve found some cases.

We are using Taco 1 with griffin-lim.
What would be the approach to address this problem?

Kindest regards,
Marjan Nikolovski

could you post the attention plot here? I believe it is about the stopnet.