Tacotron 2: Eco in the synthesis voice

Hi Guys and @erogol
Basically I trained a model on the Urdu Dataset with LJ speech format.
First I trained the model on the 3 hours of data.
I have conducted training on two separate datasets (3 hours) featuring recordings of two different individuals within the same environment and at identical sampling rates. However, the resulting outcomes are not identical, as the male voice dataset is encountering an issue of echo. Female voice is perfectly fine.
Previously I trained data of 10 hours. These were also from two different persons and were recorded in same manner as well. Issue was same at that time as well. one voice was perfectly fine but the other had the issue of echo.
what can be the root cause of this echo?

Number of Epochs: 2500
sampling rate :16000, 22050
batch size: 32

