Subscaling WaveRNN

Hi @erogol! I know that your WaveRNN is based off of fatchord’s implementation. By any chance, have you tried taking a loom at subscaling and how it fits to the WaveRNN model?

Any insights you may have will be appreciated as I’m currently implementing it.

sub-scaling meaning quantization ?

I’m talking more about generating batched samples at once for faster generation. In particular, I’m taking a look at section 4.4 called Fused Subscale WaveRNN in the original WaveRNN paper: https://arxiv.org/pdf/1802.08435v2.pdf

I currently formatted the original audio to be subtensors. However, I am trying to modify fatchord’s WaveRNN to be trained on a subtensor because I know it can generate sequential audio. I have saved a sample of a subtensor audio, even though a bit of a lower quality, the audio quality sounds similar to the original audio. I find it odd that the model can’t learn from a slightly lower quality version of the same audio.

I’ve not tried that before. I’d suggest to bet on new GAN based vocoders than working on the intricacies of WaveRNN or WaveNet but your call in the end.