I’m talking more about generating batched samples at once for faster generation. In particular, I’m taking a look at section 4.4 called Fused Subscale WaveRNN in the original WaveRNN paper: https://arxiv.org/pdf/1802.08435v2.pdf
I currently formatted the original audio to be subtensors. However, I am trying to modify fatchord’s WaveRNN to be trained on a subtensor because I know it can generate sequential audio. I have saved a sample of a subtensor audio, even though a bit of a lower quality, the audio quality sounds similar to the original audio. I find it odd that the model can’t learn from a slightly lower quality version of the same audio.