How do you fine-tune a model for it to work with the tts / synthesize.py command?

Quite some things happened.
I updated my server to an 8 core cpu and 16 GB of RAM.
Finally I could use coqui-ai / TTS inference on my server and it was quite fast.
Now my goal is to train my own voices.
I had good experiences fine-tuning already good models.
So I downloaded a model that coqui-ai / TTS uses and tried to train it using a modified version of config.json with my dataset and with coqui-ai / TTS and tts_models/en/ljspeech/glow-tts.

Alas it complained about separate_stopnet, stopnet and other parameters to be missing.

I copied them from another config.json I got the traceback

Traceback (most recent call last):
  File "/content/TTS/TTS/bin/train_tacotron.py", line 664, in <module>
    main(args)
  File "/content/TTS/TTS/bin/train_tacotron.py", line 548, in main
    optimizer.load_state_dict(checkpoint['optimizer'])
  File "/usr/local/lib/python3.7/dist-packages/torch/optim/optimizer.py", line 141, in load_state_dict
    raise ValueError("loaded state dict has a different number of "
ValueError: loaded state dict has a different number of parameter groups

which means that the models are incompatible.

How do you fine-tune a model for it to work with the tts / synthesize.py command?
Are there models that just work for fine-tuning and reusing with the current version?

I also did it the other way round, used a model from the github page and it wasn’t accepted by the tts on my server.

1 Like

You should encode the necessary commit for a model into the model.