Shape mismatch for mel_input and decoder_output

alchemi5t · November 1, 2019, 5:41am

@erogol
I was trying out the latest branch with the sound norm, and for some odd reason, the mel_input and the decoder_output shapes don’t match. mel_inputs are exactly 3.5x larger than the decoder. Any ideas on how to fix this?

RuntimeError: input and target shapes do not match: input [32 x 102 x 80], target [32 x 357 x 80] at /pytorch/aten/src/THCUNN/generic/MSECriterion.cu:12
RuntimeError: input and target shapes do not match: input [32 x 66 x 80], target [32 x 231 x 80] at /pytorch/aten/src/THCUNN/generic/MSECriterion.cu:12

alchemi5t · November 1, 2019, 5:46am

Nevermind, Figured it out. The problem was with the “r” value set in gradual training. I can’t use r=7. I’ll post here if i figure out why it caused the problem.

Topic		Replies	Views
Size mismatch for decoder.stopnet.1.linear_layer.weight: copying a param with shape torch.Size([1, 1584]) from checkpoint, the shape in current model is torch.Size([1, 1104]) TTS (Text-to-Speech)	2	3508	November 29, 2020
Inference error: Shapes of all inputs must match DeepSpeech	5	2640	January 22, 2020
Error when trying to use custom trained wavernn model TTS (Text-to-Speech)	3	485	April 21, 2020
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor TTS (Text-to-Speech) issue	3	6712	November 13, 2020
Tensor shapes mismatch during running `lm_optimizer.py` DeepSpeech issue	2	408	December 11, 2020

Shape mismatch for mel_input and decoder_output

Related topics