Help with Random Window Discriminator

alexdemartos · October 26, 2020, 7:42pm

I read you have achieved better results with RWD w.r.t. Multi-Scale Melgan Discriminator. Also that FullBand MelGAN was working better for you than MB-MelGAN. Is that right? Was it for a single-speaker corpus or a multi-speaker one?

I’ve tried to port your implementation to kan-bayashi’s PWG, but I am having trouble with the downsample factors. My hop length is 300, and this is what I’ve (unsucessfully) tried:

hop_length: 300
uncond_disc_donwsample_factors: [8, 4]
cond_disc_downsample_factors: [[5, 5, 3, 2, 2], [5, 5, 3, 2], [5, 5, 3], [5, 3, 2], [5, 2, 2]]
cond_disc_out_channels: [[128, 128, 256, 256], [128, 256, 256], [128, 256], [128, 256], [128, 256]]
window_sizes: [600, 1200, 2400, 6000, 9000]

I am getting the following error:

"parallel_wavegan/losses/stft_loss.py", line 52, in forward
return torch.norm(y_mag - x_mag, p="fro") / torch.norm(y_mag, p="fro")

RuntimeError: The size of tensor a (151) must match the size of tensor b (38) at non-singleton dimension 1

Any advices? Thanks in advance