WaveRNN trained result sound strange

I have tried to train erogol wavernn using TTS gta result, which is generated by this [notebook]

After training in a quite long time, i used WaveRNN model checkpoint(which has been trained 876000 steps with loss 4.018) with my trained tacotron2 model. The result seem strange, Could you give me some advice? Thanks. @erogol

WaveRNN_result.zip (192.4 KB)

Try to train LJspeech with the same config to see if everything is alright.

Okay. I would like to share with you if i have any update.

@erogol. I look up the config_libri360.json file and found that if the mode is string mold or gaus, then the mulaw` will not effective.

And the second confusion part is if i set mode is mold, then i noticed that in dataset.py part

if self.mode in ['gauss', 'mold']:
    x = self.ap.load_wav(f"{self.path}wavs/{file}.wav")
elif type(self.mode) is int:
    x = np.load(f'{self.path}quant/{file}.npy')

Should the wav file is the original file or the GL wav result generated by TTS ?


these are original wav files

Thanks erogol @erogol. I have trained WaveRNN based on LJSpeech datasets last Friday. I would like to share with you if i have any progress. :smiley:

@erogol. Hey erogol. I tried to trained wavernn on bits mode, which i set bits to 10. The config file, training log and samples are below. wavernn_results.zip (915.7 KB)

The sound seems having some grainy, especially the sample 630000_Ibu saya mengirim saya bingkisan.wav, would you mind sharing some advice to improve the quality ? Thanks a lot.

In bit mode it is expected. However, training more would also help.

@erogol. Thanks erogol. I would like to share with you if i have any progress.