Hi,
I trained tacotron2 on private dataset.
I am using waveRNN pertarined model.
While testing trained tacotron2 it is throwing below error:
Using model: Tacotron2
Setting up Audio Processor…
| > bits:None
| > sample_rate:22050
| > num_mels:80
| > min_level_db:-100
| > frame_shift_ms:12.5
| > frame_length_ms:50
| > ref_level_db:20
| > num_freq:1025
| > power:1.5
| > preemphasis:0.98
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:False
| > mel_fmin:0.0
| > mel_fmax:8000.0
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:True
| > n_fft:2048
| > hop_length:275
| > win_length:1102
Traceback (most recent call last):
File “test.py”, line 66, in
model.load_state_dict(cp[‘model’])
File “/home/ubuntu/drive_a/mayank/Test/vin_test_3.6/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 1052, in load_state_dict
self.class.name, “\n\t”.join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron2:
Missing key(s) in state_dict: “encoder.convolutions.0.net.0.weight”, “encoder.convolutions.0.net.0.bias”, “encoder.convolutions.0.net.1.weight”, “encoder.convolutions.0.net.1.bias”, “encoder.convolutions.0.net.1.running_mean”, “encoder.convolutions.0.net.1.running_var”, “encoder.convolutions.1.net.0.weight”, “encoder.convolutions.1.net.0.bias”, “encoder.convolutions.1.net.1.weight”, “encoder.convolutions.1.net.1.bias”, “encoder.convolutions.1.net.1.running_mean”, “encoder.convolutions.1.net.1.running_var”, “encoder.convolutions.2.net.0.weight”, “encoder.convolutions.2.net.0.bias”, “encoder.convolutions.2.net.1.weight”, “encoder.convolutions.2.net.1.bias”, “encoder.convolutions.2.net.1.running_mean”, “encoder.convolutions.2.net.1.running_var”, “decoder.prenet.layers.0.linear_layer.weight”, “decoder.prenet.layers.0.bn.weight”, “decoder.prenet.layers.0.bn.bias”, “decoder.prenet.layers.0.bn.running_mean”, “decoder.prenet.layers.0.bn.running_var”, “decoder.prenet.layers.1.linear_layer.weight”, “decoder.prenet.layers.1.bn.weight”, “decoder.prenet.layers.1.bn.bias”, “decoder.prenet.layers.1.bn.running_mean”, “decoder.prenet.layers.1.bn.running_var”, “decoder.attention_layer.query_layer.linear_layer.weight”, “decoder.attention_layer.inputs_layer.linear_layer.weight”, “decoder.attention_layer.v.linear_layer.weight”, “decoder.attention_layer.v.linear_layer.bias”, “decoder.attention_layer.location_layer.location_conv.weight”, “decoder.attention_layer.location_layer.location_dense.linear_layer.weight”, “decoder.attention_rnn_init.weight”, “decoder.go_frame_init.weight”, “decoder.decoder_rnn_inits.weight”, “postnet.convolutions.0.net.0.weight”, “postnet.convolutions.0.net.0.bias”, “postnet.convolutions.0.net.1.weight”, “postnet.convolutions.0.net.1.bias”, “postnet.convolutions.0.net.1.running_mean”, “postnet.convolutions.0.net.1.running_var”, “postnet.convolutions.1.net.0.weight”, “postnet.convolutions.1.net.0.bias”, “postnet.convolutions.1.net.1.weight”, “postnet.convolutions.1.net.1.bias”, “postnet.convolutions.1.net.1.running_mean”, “postnet.convolutions.1.net.1.running_var”, “postnet.convolutions.2.net.0.weight”, “postnet.convolutions.2.net.0.bias”, “postnet.convolutions.2.net.1.weight”, “postnet.convolutions.2.net.1.bias”, “postnet.convolutions.2.net.1.running_mean”, “postnet.convolutions.2.net.1.running_var”, “postnet.convolutions.3.net.0.weight”, “postnet.convolutions.3.net.0.bias”, “postnet.convolutions.3.net.1.weight”, “postnet.convolutions.3.net.1.bias”, “postnet.convolutions.3.net.1.running_mean”, “postnet.convolutions.3.net.1.running_var”, “postnet.convolutions.4.net.0.weight”, “postnet.convolutions.4.net.0.bias”, “postnet.convolutions.4.net.1.weight”, “postnet.convolutions.4.net.1.bias”, “postnet.convolutions.4.net.1.running_mean”, “postnet.convolutions.4.net.1.running_var”.
Unexpected key(s) in state_dict: “coarse_decoder.prenet.linear_layers.0.linear_layer.weight”, “coarse_decoder.prenet.linear_layers.1.linear_layer.weight”, “coarse_decoder.attention_rnn.weight_ih”, “coarse_decoder.attention_rnn.weight_hh”, “coarse_decoder.attention_rnn.bias_ih”, “coarse_decoder.attention_rnn.bias_hh”, “coarse_decoder.attention.query_layer.linear_layer.weight”, “coarse_decoder.attention.inputs_layer.linear_layer.weight”, “coarse_decoder.attention.v.linear_layer.weight”, “coarse_decoder.attention.v.linear_layer.bias”, “coarse_decoder.attention.location_layer.location_conv1d.weight”, “coarse_decoder.attention.location_layer.location_dense.linear_layer.weight”, “coarse_decoder.decoder_rnn.weight_ih”, “coarse_decoder.decoder_rnn.weight_hh”, “coarse_decoder.decoder_rnn.bias_ih”, “coarse_decoder.decoder_rnn.bias_hh”, “coarse_decoder.linear_projection.linear_layer.weight”, “coarse_decoder.linear_projection.linear_layer.bias”, “coarse_decoder.stopnet.1.linear_layer.weight”, “coarse_decoder.stopnet.1.linear_layer.bias”, “encoder.convolutions.0.convolution1d.weight”, “encoder.convolutions.0.convolution1d.bias”, “encoder.convolutions.0.batch_normalization.weight”, “encoder.convolutions.0.batch_normalization.bias”, “encoder.convolutions.0.batch_normalization.running_mean”, “encoder.convolutions.0.batch_normalization.running_var”, “encoder.convolutions.0.batch_normalization.num_batches_tracked”, “encoder.convolutions.1.convolution1d.weight”, “encoder.convolutions.1.convolution1d.bias”, “encoder.convolutions.1.batch_normalization.weight”, “encoder.convolutions.1.batch_normalization.bias”, “encoder.convolutions.1.batch_normalization.running_mean”, “encoder.convolutions.1.batch_normalization.running_var”, “encoder.convolutions.1.batch_normalization.num_batches_tracked”, “encoder.convolutions.2.convolution1d.weight”, “encoder.convolutions.2.convolution1d.bias”, “encoder.convolutions.2.batch_normalization.weight”, “encoder.convolutions.2.batch_normalization.bias”, “encoder.convolutions.2.batch_normalization.running_mean”, “encoder.convolutions.2.batch_normalization.running_var”, “encoder.convolutions.2.batch_normalization.num_batches_tracked”, “decoder.attention.query_layer.linear_layer.weight”, “decoder.attention.inputs_layer.linear_layer.weight”, “decoder.attention.v.linear_layer.weight”, “decoder.attention.v.linear_layer.bias”, “decoder.attention.location_layer.location_conv1d.weight”, “decoder.attention.location_layer.location_dense.linear_layer.weight”, “decoder.prenet.linear_layers.0.linear_layer.weight”, “decoder.prenet.linear_layers.1.linear_layer.weight”, “postnet.convolutions.0.convolution1d.weight”, “postnet.convolutions.0.convolution1d.bias”, “postnet.convolutions.0.batch_normalization.weight”, “postnet.convolutions.0.batch_normalization.bias”, “postnet.convolutions.0.batch_normalization.running_mean”, “postnet.convolutions.0.batch_normalization.running_var”, “postnet.convolutions.0.batch_normalization.num_batches_tracked”, “postnet.convolutions.1.convolution1d.weight”, “postnet.convolutions.1.convolution1d.bias”, “postnet.convolutions.1.batch_normalization.weight”, “postnet.convolutions.1.batch_normalization.bias”, “postnet.convolutions.1.batch_normalization.running_mean”, “postnet.convolutions.1.batch_normalization.running_var”, “postnet.convolutions.1.batch_normalization.num_batches_tracked”, “postnet.convolutions.2.convolution1d.weight”, “postnet.convolutions.2.convolution1d.bias”, “postnet.convolutions.2.batch_normalization.weight”, “postnet.convolutions.2.batch_normalization.bias”, “postnet.convolutions.2.batch_normalization.running_mean”, “postnet.convolutions.2.batch_normalization.running_var”, “postnet.convolutions.2.batch_normalization.num_batches_tracked”, “postnet.convolutions.3.convolution1d.weight”, “postnet.convolutions.3.convolution1d.bias”, “postnet.convolutions.3.batch_normalization.weight”, “postnet.convolutions.3.batch_normalization.bias”, “postnet.convolutions.3.batch_normalization.running_mean”, “postnet.convolutions.3.batch_normalization.running_var”, “postnet.convolutions.3.batch_normalization.num_batches_tracked”, “postnet.convolutions.4.convolution1d.weight”, “postnet.convolutions.4.convolution1d.bias”, “postnet.convolutions.4.batch_normalization.weight”, “postnet.convolutions.4.batch_normalization.bias”, “postnet.convolutions.4.batch_normalization.running_mean”, “postnet.convolutions.4.batch_normalization.running_var”, “postnet.convolutions.4.batch_normalization.num_batches_tracked”.
size mismatch for embedding.weight: copying a param with shape torch.Size([129, 512]) from checkpoint, the shape in current model is torch.Size([62, 512]).
size mismatch for decoder.linear_projection.linear_layer.weight: copying a param with shape torch.Size([560, 1536]) from checkpoint, the shape in current model is torch.Size([80, 1536]).
size mismatch for decoder.linear_projection.linear_layer.bias: copying a param with shape torch.Size([560]) from checkpoint, the shape in current model is torch.Size([80]).
size mismatch for decoder.stopnet.1.linear_layer.weight: copying a param with shape torch.Size([1, 1584]) from checkpoint, the shape in current model is torch.Size([1, 1104]).