Training a universal vocoder

I meant a small penalty term proportional to the intensity of each pixel to force the model to produce only the necessary sounds, but it won’t work since you say that there is no noise for the training set.

Maybe it’s an issue of robustness to different recording conditions and sound preprocessing, have you tried with an unseen LibriTTS speaker ?

no I didn’t try a speaker from libriTTS. It may be better with those.

You can maybe help me to train the larger model. I can provide the config.json in that case.

Absolutely, I’d love to help! I have a free GPU late this week (Thursday or Friday) or next week, if you don’t mind waiting

Can you create an issue on the repo for this to follow to progress there? I’ll post the config there to make it available to everyone who is interested.

1 Like

@erogol Could you please share the model, config and commit of the smaller model that you trained as well ? If I’ve got time latter this week, I’d like to try to fine-tune it with an augmented version of the dataset (artificial noise, gain/pitch change, etc…) to see if I can make it more robust noise wise.