Can we use the pretrained models on the original PWGan repo?

georroussos · May 16, 2020, 5:18pm

Hi,
There are many pretrained vocoders on the original repo and some of them are on LibriTTS. They are very good quality. I wonder if anyone has used them with Mozilla TTS as is.

erogol · May 17, 2020, 11:41am

no they use different normalization method for network inputs. But if you train a new TTS model using mean-var normalization then you can use their models. Which is possible with dev branch.

georroussos · May 17, 2020, 1:59pm

Ah, that is such a shame Some of them are trained for 1M steps and when I performed analysis and synthesis using the LibriTTS one, it sounded so good! I am now finetuning the universal WaveRNN you once trained, for 22050Hz instead and I will share it if anyone is interested That will hopefully make it easier to apply on more models. On 16kHZ models it performs super nicely, but on 22kHZ there is some noise. I think I should finetune more. How much more do you think I should do? I am finetuning on LibriTTS.

erogol · May 17, 2020, 2:10pm

It is hard to guess. Just listen the generated audios and see the quality.

You can consider to train PWGAN using my branch on LibriTTS. You can even finetune their released model for faster convergence.

Another option so to renormalize the spectrograms before PWGAN using their method at inference. That might also work.

georroussos · May 17, 2020, 2:23pm

I had the same idea, to finetune their model, in order to make it compatible with your fork but I think I messed it up. In short, I couldn’t find the correct yaml, because if I cloned the repo using the latest commit, it gave me the yaml for MelGAN and then, if I checked out the PWGAN commit, it gives me the ttsv1 and ttsv2 configs, but not the melgan one in bin/configs, which is needed for feature extraction and I guess training. Because the melgan config would train MelGAN, wouldn’t it? Also, are these features for both PWGan and MelGAN? I got confused otherwise yes, I can totally try it!! But I don’t know what the correct workflow and configs are. Their LibriTTS vocoder is extremely good.

erogol · May 17, 2020, 3:19pm

You can take any config from anywhere and run with my branch. Configs are compatible.

georroussos · May 17, 2020, 3:43pm

Cool! I will try using the config that came with the model

georroussos · May 18, 2020, 11:08am

Which commit do I use to train? fca88f9 doesn’t have the config for preprocessing and the latest one doesn’t have the tts configs. The one in configs is MelGAN.

I tried the configs from the original repo but they didn’t work.

erogol · May 18, 2020, 12:06pm

didn’t work means? what was the error?

georroussos · May 18, 2020, 12:15pm

First I tried to extract the feats using this config https://github.com/erogol/ParallelWaveGAN/blob/tts/parallel_wavegan/configs/melgan.v3.long.tts.yaml

But this is for Melgan training and I wanna train PWGAN

So I checked out fca88f9 to get the tts configs. But these configs are not like the one in the original PWGAN repo, https://github.com/kan-bayashi/ParallelWaveGAN/blob/master/egs/libritts/voc1/conf/parallel_wavegan.v1.long.yaml

erogol · May 18, 2020, 12:17pm

have you tried configs directly from the origina repo?

georroussos · May 18, 2020, 12:20pm

Yep, the one here https://github.com/kan-bayashi/ParallelWaveGAN/blob/master/egs/libritts/voc1/conf/parallel_wavegan.v1.long.yaml which accompanies the LibriTTS model I want to use for finetuning, but it seems that the parameters in the yaml file are sprinkled all around and have different formatting

erogol · May 18, 2020, 12:21pm

which parameter for instance?

georroussos · May 18, 2020, 12:28pm

I think the config in the original repo doesn’t have the datasets section found in the tts configs, and the audio section in the tts configs has options that the other one doesn’t have. I tried to add the whole of it, but it didn’t work. The product for upsample scales is computed using 4 values in the original config and 3 in your config. And the original has a section for STFT losses.

stft_loss_params:
fft_sizes: [1024, 2048, 512] # List of FFT size for STFT-based loss.
hop_sizes: [120, 240, 50] # List of hop size for STFT-based loss
win_lengths: [600, 1200, 240] # List of window length for STFT-based loss.
window: “hann_window”

erogol · May 18, 2020, 12:32pm

what you can quite easily do is that take the original config and change the fields as necessary by my fork.

georroussos · May 18, 2020, 12:39pm

I will try again. What config should I be using for feature extraction? The melgan one in configs?

erogol · May 18, 2020, 12:41pm

for future extraction only thing matters is the audio parameters. just copy and paste the them from the melgan config to the config you like to use. You could all figure these by reading the feature extraction code.

georroussos · May 18, 2020, 1:02pm

I was able to get it to work after all I took the PWGAN+TTS notebook, here Google Colab and @edresson1’s fork, since that is the one I work with, and I changed the synthesize function around. I was able to load both a trained vocoder using your fork and the vocoders from the original repo, using a franken’d config.

Sadly, the quality is not good. I think it is because of what you mentioned with normalization. The voice comes out as hollow and distorted, both with the original repo vocoders and the one I tried to finetune, even though after 30.000 steps (using LibriTTS as finetune) was producing good speech during eval. Now I will try to finetune the LJSpeech vocoder you trained for 40.000 steps and see if I get better results.

Topic		Replies	Views
Tacotron 2 with ParallelWaveGAN. Next step TTS (Text-to-Speech)	30	3718	September 15, 2020
I cannot get PWGan to converge/sound good TTS (Text-to-Speech)	5	739	October 12, 2020
Tacotron2 + PWGAN produces Deep/Muffled Voice TTS (Text-to-Speech)	9	2989	June 7, 2021
Training universal PWGAN, background noise and bad output TTS (Text-to-Speech)	13	935	August 5, 2020
Train Multispeaker Dataset + WaveRNN TTS (Text-to-Speech)	50	5744	October 5, 2020

Can we use the pretrained models on the original PWGan repo?

Related topics