Quite some things happened.
I updated my server to an 8 core cpu and 16 GB of RAM.
Finally I could use coqui-ai / TTS
inference on my server and it was quite fast.
Now my goal is to train my own voices.
I had good experiences fine-tuning already good models.
So I downloaded a model that coqui-ai / TTS
uses and tried to train it using a modified version of config.json
with my dataset and with coqui-ai / TTS
and tts_models/en/ljspeech/glow-tts
.
Alas it complained about separate_stopnet
, stopnet
and other parameters to be missing.
I copied them from another config.json
I got the traceback
Traceback (most recent call last):
File "/content/TTS/TTS/bin/train_tacotron.py", line 664, in <module>
main(args)
File "/content/TTS/TTS/bin/train_tacotron.py", line 548, in main
optimizer.load_state_dict(checkpoint['optimizer'])
File "/usr/local/lib/python3.7/dist-packages/torch/optim/optimizer.py", line 141, in load_state_dict
raise ValueError("loaded state dict has a different number of "
ValueError: loaded state dict has a different number of parameter groups
which means that the models are incompatible.
How do you fine-tune a model for it to work with the tts / synthesize.py command?
Are there models that just work for fine-tuning and reusing with the current version?
I also did it the other way round, used a model from the github page and it wasn’t accepted by the tts on my server.