I was able to get interpretable speech from the latest dev branch with LJSpeech using the template config (see separate thread wrt to some of the issues I was having). That config uses Tacotron2 and one thing that really stands out in the test audio is how ‘echo’ is pronounced ‘ECH OH’ instead of ‘ECK OH’ at the end of 1000 epochs. One of the things that impressive when I first tried this code base, before Tacotron2 had been implemented, is how it figured out the correct ‘ECK OH’ pronunciation .
I haven’t noticed anyone else point it out, is it just me? I am running without using phonemes for now but the older model was clever enough to figure this out.