What is the cause of this noise?

I am trying to follow the example in the github wiki page. Specifically this one

It runs well and the result is almost perfect. Most sentences are read out perfectly. But there is this one sentence, of which generates one small glitch. Any ideas?

I am not an expert in deep learning but I do have some background in machine learning. Thanks in advance!

That follow on audio is kind of a doozy.

Try regenerating it with different punctuation?

Hi. There is no punctuation in the sentence. Sorry didn’t paste it at the first time. The sentence is

And it can be foreseen that the film will be nominated for an award in the performance or script category at the Oscar next year

Unexpectedly, If the first word “And” removed, the result is OK.

it can be foreseen that the film will be nominated for an award in the performance or script category at the Oscar next year

There is nothing to do except training a new model, preferentially with a better datatset.

I don’t know much about audio processing, but the noise feels like a lot different from human voice, maybe some post processing could handle that? I am searching and learning about audio processing now.