Training a universal vocoder

julian.weber · August 15, 2020, 8:32pm

I meant a small penalty term proportional to the intensity of each pixel to force the model to produce only the necessary sounds, but it won’t work since you say that there is no noise for the training set.

Maybe it’s an issue of robustness to different recording conditions and sound preprocessing, have you tried with an unseen LibriTTS speaker ?

erogol · August 17, 2020, 10:58am

no I didn’t try a speaker from libriTTS. It may be better with those.

erogol · August 17, 2020, 10:58am

You can maybe help me to train the larger model. I can provide the config.json in that case.

georroussos · August 17, 2020, 3:19pm

Absolutely, I’d love to help! I have a free GPU late this week (Thursday or Friday) or next week, if you don’t mind waiting

erogol · August 18, 2020, 10:40am

Can you create an issue on the repo for this to follow to progress there? I’ll post the config there to make it available to everyone who is interested.

julian.weber · August 18, 2020, 11:39am

@erogol Could you please share the model, config and commit of the smaller model that you trained as well ? If I’ve got time latter this week, I’d like to try to fine-tune it with an augmented version of the dataset (artificial noise, gain/pitch change, etc…) to see if I can make it more robust noise wise.