Dev branch on LJSpeech pronunciation of 'echo'

maneeshkyadav · March 21, 2020, 6:23pm

I was able to get interpretable speech from the latest dev branch with LJSpeech using the template config (see separate thread wrt to some of the issues I was having). That config uses Tacotron2 and one thing that really stands out in the test audio is how ‘echo’ is pronounced ‘ECH OH’ instead of ‘ECK OH’ at the end of 1000 epochs. One of the things that impressive when I first tried this code base, before Tacotron2 had been implemented, is how it figured out the correct ‘ECK OH’ pronunciation .

I haven’t noticed anyone else point it out, is it just me? I am running without using phonemes for now but the older model was clever enough to figure this out.

georroussos · March 23, 2020, 12:15pm

I have also noticed this on test time when training a multi-speaker model on a different dataset; “echo” is pronounced differently. However, I think there are ways to manipulate pronunciation of specific words.