Subscaling WaveRNN

francis · February 28, 2020, 6:01am

Hi @erogol! I know that your WaveRNN is based off of fatchord’s implementation. By any chance, have you tried taking a loom at subscaling and how it fits to the WaveRNN model?

Any insights you may have will be appreciated as I’m currently implementing it.

erogol · February 28, 2020, 11:58am

sub-scaling meaning quantization ?

francis · February 28, 2020, 1:33pm

I’m talking more about generating batched samples at once for faster generation. In particular, I’m taking a look at section 4.4 called Fused Subscale WaveRNN in the original WaveRNN paper: https://arxiv.org/pdf/1802.08435v2.pdf

I currently formatted the original audio to be subtensors. However, I am trying to modify fatchord’s WaveRNN to be trained on a subtensor because I know it can generate sequential audio. I have saved a sample of a subtensor audio, even though a bit of a lower quality, the audio quality sounds similar to the original audio. I find it odd that the model can’t learn from a slightly lower quality version of the same audio.

erogol · February 28, 2020, 1:40pm

I’ve not tried that before. I’d suggest to bet on new GAN based vocoders than working on the intricacies of WaveRNN or WaveNet but your call in the end.