Hello TTS team, thanks a lot for open sourcing this project!
I’m a newbie to TTS DeepLearning stuff… so, I’m trying to get a hang of this by going through the process of training and inferring using LJSpeech sample dataset. I ran the training script using this notebook - https://gist.github.com/erogol/97516ad65b44dbddb8cd694953187c5b. It went well… I’m using Tacotron2 model. I stopped the training at 10K checkpoint. I want to go through the infer process of generating the wave file for an input sentence. I know that 10K iterations are not enough, but I just want to iron out end to end process before committing to performing full training (even if what I get is just static or gibberish). So, I’m now trying to perform infer using this notebook - https://colab.research.google.com/github/tugstugi/dl-colab-notebooks/blob/master/notebooks/Mozilla_TTS_WaveRNN.ipynb
I’m ending up with the following errors:
(this is the cell I’m executing; the last one)
align, spec, stop_tokens, wav = tts(model, SENTENCE, CONFIG, use_cuda, ap, speaker_id=0, use_gl=False, figures=False)
(error below)
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1370 ret = torch.addmm(bias, input, weight.t())
1371 else:
-> 1372 output = input.matmul(weight.t())
1373 if bias is not None:
1374 output += bias
RuntimeError: size mismatch, m1: [1 x 560], m2: [80 x 256] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:290
=====
I’m pretty much using standard config files and sample dataset (LJSpeech). Wondering what config changes should I make (to tts or wavenet), or how to overcome this error, for the 10K checkpoint model I have (Tacotron2)? Any help is appreciated. Thanks!