I am having multiple problems trying to run inference on AWS, when I try to use the server here is what happens:
python -m TTS.server.server --tts_checkpoint checkpoint_670000.pth.tar --tts_config config.json
Loading TTS model …
| > model config: config.json
| > checkpoint file: checkpoint_670000.pth.tar
Setting up Audio Processor…
| > sample_rate:22050
| > num_mels:80
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > num_freq:513
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000.0
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60
| > do_sound_norm:False
| > stats_path:None
| > hop_length:256
| > win_length:1024
| > n_fft:1024
Using model: Tacotron2
Traceback (most recent call last):
File “/home/ubuntu/anaconda3/envs/python3/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
“main”, mod_spec)
File “/home/ubuntu/anaconda3/envs/python3/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/TTS-0.0.3+3366328-py3.6.egg/TTS/server/server.py”, line 62, in
synthesizer = Synthesizer(args)
File “/home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/TTS-0.0.3+3366328-py3.6.egg/TTS/server/synthesizer.py”, line 36, in init
self.config.use_cuda)
File “/home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/TTS-0.0.3+3366328-py3.6.egg/TTS/server/synthesizer.py”, line 73, in load_tts
self.tts_model.load_state_dict(cp[‘model’])
File “/home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 830, in load_state_dict
self.class.name, “\n\t”.join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron2:
Missing key(s) in state_dict: “encoder.convolutions.0.convolution1d.weight”, “encoder.convolutions.0.convolution1d.bias”, “encoder.convolutions.0.batch_normalization.weight”, “encoder.convolutions.0.batch_normalization.bias”, “encoder.convolutions.0.batch_normalization.running_mean”, “encoder.convolutions.0.batch_normalization.running_var”, “encoder.convolutions.1.convolution1d.weight”, “encoder.convolutions.1.convolution1d.bias”, “encoder.convolutions.1.batch_normalization.weight”, “encoder.convolutions.1.batch_normalization.bias”, “encoder.convolutions.1.batch_normalization.running_mean”, “encoder.convolutions.1.batch_normalization.running_var”, “encoder.convolutions.2.convolution1d.weight”, “encoder.convolutions.2.convolution1d.bias”, “encoder.convolutions.2.batch_normalization.weight”, “encoder.convolutions.2.batch_normalization.bias”, “encoder.convolutions.2.batch_normalization.running_mean”, “encoder.convolutions.2.batch_normalization.running_var”, “decoder.prenet.linear_layers.0.linear_layer.weight”, “decoder.prenet.linear_layers.1.linear_layer.weight”, “decoder.attention.location_layer.location_conv1d.weight”, “postnet.convolutions.0.convolution1d.weight”, “postnet.convolutions.0.convolution1d.bias”, “postnet.convolutions.0.batch_normalization.weight”, “postnet.convolutions.0.batch_normalization.bias”, “postnet.convolutions.0.batch_normalization.running_mean”, “postnet.convolutions.0.batch_normalization.running_var”, “postnet.convolutions.1.convolution1d.weight”, “postnet.convolutions.1.convolution1d.bias”, “postnet.convolutions.1.batch_normalization.weight”, “postnet.convolutions.1.batch_normalization.bias”, “postnet.convolutions.1.batch_normalization.running_mean”, “postnet.convolutions.1.batch_normalization.running_var”, “postnet.convolutions.2.convolution1d.weight”, “postnet.convolutions.2.convolution1d.bias”, “postnet.convolutions.2.batch_normalization.weight”, “postnet.convolutions.2.batch_normalization.bias”, “postnet.convolutions.2.batch_normalization.running_mean”, “postnet.convolutions.2.batch_normalization.running_var”, “postnet.convolutions.3.convolution1d.weight”, “postnet.convolutions.3.convolution1d.bias”, “postnet.convolutions.3.batch_normalization.weight”, “postnet.convolutions.3.batch_normalization.bias”, “postnet.convolutions.3.batch_normalization.running_mean”, “postnet.convolutions.3.batch_normalization.running_var”, “postnet.convolutions.4.convolution1d.weight”, “postnet.convolutions.4.convolution1d.bias”, “postnet.convolutions.4.batch_normalization.weight”, “postnet.convolutions.4.batch_normalization.bias”, “postnet.convolutions.4.batch_normalization.running_mean”, “postnet.convolutions.4.batch_normalization.running_var”.
Unexpected key(s) in state_dict: “encoder.convolutions.0.net.0.weight”, “encoder.convolutions.0.net.0.bias”, “encoder.convolutions.0.net.1.weight”, “encoder.convolutions.0.net.1.bias”, “encoder.convolutions.0.net.1.running_mean”, “encoder.convolutions.0.net.1.running_var”, “encoder.convolutions.0.net.1.num_batches_tracked”, “encoder.convolutions.1.net.0.weight”, “encoder.convolutions.1.net.0.bias”, “encoder.convolutions.1.net.1.weight”, “encoder.convolutions.1.net.1.bias”, “encoder.convolutions.1.net.1.running_mean”, “encoder.convolutions.1.net.1.running_var”, “encoder.convolutions.1.net.1.num_batches_tracked”, “encoder.convolutions.2.net.0.weight”, “encoder.convolutions.2.net.0.bias”, “encoder.convolutions.2.net.1.weight”, “encoder.convolutions.2.net.1.bias”, “encoder.convolutions.2.net.1.running_mean”, “encoder.convolutions.2.net.1.running_var”, “encoder.convolutions.2.net.1.num_batches_tracked”, “decoder.prenet.layers.0.linear_layer.weight”, “decoder.prenet.layers.0.bn.weight”, “decoder.prenet.layers.0.bn.bias”, “decoder.prenet.layers.0.bn.running_mean”, “decoder.prenet.layers.0.bn.running_var”, “decoder.prenet.layers.0.bn.num_batches_tracked”, “decoder.prenet.layers.1.linear_layer.weight”, “decoder.prenet.layers.1.bn.weight”, “decoder.prenet.layers.1.bn.bias”, “decoder.prenet.layers.1.bn.running_mean”, “decoder.prenet.layers.1.bn.running_var”, “decoder.prenet.layers.1.bn.num_batches_tracked”, “decoder.attention.location_layer.location_conv.weight”, “postnet.convolutions.0.net.0.weight”, “postnet.convolutions.0.net.0.bias”, “postnet.convolutions.0.net.1.weight”, “postnet.convolutions.0.net.1.bias”, “postnet.convolutions.0.net.1.running_mean”, “postnet.convolutions.0.net.1.running_var”, “postnet.convolutions.0.net.1.num_batches_tracked”, “postnet.convolutions.1.net.0.weight”, “postnet.convolutions.1.net.0.bias”, “postnet.convolutions.1.net.1.weight”, “postnet.convolutions.1.net.1.bias”, “postnet.convolutions.1.net.1.running_mean”, “postnet.convolutions.1.net.1.running_var”, “postnet.convolutions.1.net.1.num_batches_tracked”, “postnet.convolutions.2.net.0.weight”, “postnet.convolutions.2.net.0.bias”, “postnet.convolutions.2.net.1.weight”, “postnet.convolutions.2.net.1.bias”, “postnet.convolutions.2.net.1.running_mean”, “postnet.convolutions.2.net.1.running_var”, “postnet.convolutions.2.net.1.num_batches_tracked”, “postnet.convolutions.3.net.0.weight”, “postnet.convolutions.3.net.0.bias”, “postnet.convolutions.3.net.1.weight”, “postnet.convolutions.3.net.1.bias”, “postnet.convolutions.3.net.1.running_mean”, “postnet.convolutions.3.net.1.running_var”, “postnet.convolutions.3.net.1.num_batches_tracked”, “postnet.convolutions.4.net.0.weight”, “postnet.convolutions.4.net.0.bias”, “postnet.convolutions.4.net.1.weight”, “postnet.convolutions.4.net.1.bias”, “postnet.convolutions.4.net.1.running_mean”, “postnet.convolutions.4.net.1.running_var”, “postnet.convolutions.4.net.1.num_batches_tracked”.
size mismatch for embedding.weight: copying a param with shape torch.Size([129, 512]) from checkpoint, the shape in current model is torch.Size([181, 512]).