What is the right way to take advantage of a german model with train_tts.py?

I created a german corpus with csv and wav files.
I found the thorsten model.
I found the train_tts.py --restore_path option.
Yet using the model with restore path and setting the options to my model creates a size mismatch for embedding.weight: copying a param with shape torch.Size([181, 512]) from checkpoint, the shape in current model is torch.Size([129, 512]). error message.
What is the right way to take advantage of a model with train_tts.py?

So you are using the thorsten-de model (which one) and want to continue training with your dataset, right?
Did you verify that you are using the same branch/commit?
Is your Taco2 configuration compatibel with the thorsten-de model?

I am using the 3424181, according to https://github.com/erogol/TTS_recipes.
How do I make my taco2 configuration compatible with the thorsten-de model?

Don’t everybody speak up at once! :grinning:

It’s a Saturday. This is an open source project where people are volunteers.

Anyway, best of luck getting it working :slightly_smiling_face:

Exactly. It’s a saturday. Where volunteers have a lot of free time to help newbies :slight_smile:

Ha! Very good!

You could try fine tuning with the existing Thorsten dataset just to make sure your setup is okay. I appreciate the desire to jump into something new with your own dataset but these things are fairly complex and difficult even when you’ve used them for a while, so a beginner trying with new data can easily put a foot wrong.

Once you know that’s working then it would be a matter to turning to ensure your config is compatible and that there aren’t issues with your dataset.

For the config you’d want the values to closely follow those used in the recipe (except of course updating local directories and that kind of thing).
For the dataset it seems like you would have it in a consistent structure or you wouldn’t even have got thus far. Double check things like the sampling rate to see it matches (22,050Hz is typically used with these TTS models but that’s not strictly required; in this case that value is what you’ll see in the config)

I am off to do other things but I’ll check in maybe tomorrow or Monday

Looks like you are mixing something up here regarding volunteer support. Instead of pushing people you could start answering questions that have been asked and tell us something more about your project and configuration. So far it is only best guessing what you want to do…

Some weeks ago there was another impatient guy here, so people here are a bit sensitive now when being pushed…

Looks like you are mixing something up here regarding volunteer support. Instead of pushing people you could start answering questions that have been asked and tell us something more about your project and configuration. So far it is only best guessing what you want to do…

Notice the smiley 2 afterwards. No hard feelings.
Now I did answer your question here.

Does the error message tell you nothing about the likely cause?

Nope, you answered one and a half of my questions. You may post your Taco2-config for clarification. But do not haste, I am out of here for today…

This is the diff between the original file from the GHR and my config:

+ 2:3: "github_branch":"* generic_vocoder",
+ 3:3: "restore_path":"/home/erogol/Models/trohsten-de/thorsten_de-ddc-August-11-2020_11+38PM-3f34829/checkpoint_180000.pth.tar",
+ 4:3: "github_branch":"* dev",
+ 5:3: "restore_path":"/home/erogol/Models/trohsten-de/thorsten_de-ddc-August-11-2020_11+38PM-3f34829/checkpoint_60000.pth.tar",
+ 6:3: "github_branch":"* dev",
+ 7:3: "restore_path":"/home/erogol/Models/trohsten-de/thorsten_de-ddc-August-11-2020_11+38PM-3f34829/checkpoint_60000.pth.tar",
+ 8:3: "github_branch":"* dev",
- 10:3:     "run_name": "myDataset",
+ 10:3:     "run_name": "thorsten_de-ddc",
- 48:3:         "stats_path": "/content/drive/MyDrive/TTS_recipes/scale_stats.npy"    // DO NOT USE WITH MULTI_SPEAKER MODEL. scaler stats file computed by 'compute_statistics.py'. If it is defined, mean-std based notmalization is used and other normalization params are ignored
+ 48:3:         "stats_path": "./scale_stats.npy"    // DO NOT USE WITH MULTI_SPEAKER MODEL. scaler stats file computed by 'compute_statistics.py'. If it is defined, mean-std based notmalization is used and other normalization params are ignored
- 42:3:         
+ 51:3:     // VOCABULARY PARAMETERS
+ 52:3:     // if custom character set is not defined,
+ 53:3:     // default set in symbols.py is used
+ 54:3:     // "characters":{
+ 55:3:     //     "pad": "_",
+ 56:3:     //     "eos": "~",
+ 57:3:     //     "bos": "^",
+ 58:3:     //     "characters": "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!'(),-.:;? ",
+ 59:3:     //     "punctuations":"!'(),-.:;? ",
+ 60:3:     //     "phonemes":"iy???u???eø????o?œ????æ?a??????|?!?????pbtd??c?kgq??????n?m?r?????ßfv?ðsz????ç?x???h?h??????j?l???'????w???????????"
+ 61:3:     // },
+ 63:3:     // DISTRIBUTED TRAINING
+ 64:3:     "distributed":{
+ 65:3:         "backend": "nccl",
+ 66:3:         "url": "tcp:\/\/localhost:54321"
+ 67:3:     },
- 125:3:     "text_cleaner": "german_phoneme_cleaners",
+ 125:3:     "text_cleaner": "phoneme_cleaners",
- 134:3:     "output_path": "/content/drive/MyDrive/TTS_recipes/Datensatz/myDatasetAusgabe",
+ 134:3:     "output_path": "/home/erogol/Models/trohsten-de/",
- 137:3:     "phoneme_cache_path": "/content/drive/MyDrive/TTS_recipes/Datensatz/myDatasetAusgabe/phoneme/",  // phoneme computation is slow, therefore, it caches results in the given folder.
+ 137:3:     "phoneme_cache_path": "/home/erogol/Models/trohsten-de/phoneme_cache/",  // phoneme computation is slow, therefore, it caches results in the given folder.
- 138:3:     "use_phonemes": true,           // use phonemes instead of raw characters. It is suggested for better pronounciation.
+ 138:3:     "use_phonemes": false,           // use phonemes instead of raw characters. It is suggested for better pronounciation.
- 161:3:                 "path": "/content/drive/MyDrive/TTS_recipes/Datensatz/myDataset/",
+ 161:3:                 "path": "/home/erogol/Data/thorsten-german/",
- 162:3:                 "meta_file_train": "metadata_train.csv", // for vtck if list, ignore speakers id in list for train, its useful for test cloning with new speakers
+ 162:3:                 "meta_file_train": "metadata.csv", // for vtck if list, ignore speakers id in list for train, its useful for test cloning with new speakers
- 163:3:                 "meta_file_val": "metadata_val.csv"
+ 163:3:                 "meta_file_val": null
- 145:3: 
- 146:3:

129 and 181, these numbers should say something.

most likely could be because use_phonemes is set to false, try true.

1 Like

vice versa, it’s true, I set it now to false. The error message disappeared.
I am psyched.

I’ve uploaded some checkpoint and config files i’m using based on a training by @othiele.

https://drive.google.com/drive/folders/1GqT_6miOf3lW2QvQ1Y3rHf75h6PgQDCT?usp=sharing

Just in case you want to take a look at the config i’m using.
We use german phoneme cleaning by @repodiac.

Some additional params:

Mozilla TTS commit: d4319fe

git clone https://github.com/repodiac/german_transliterate
%cd german_transliterate
!pip install -e .
1 Like

Wouldn’t it be a great idea to have one generic german voice created with 1 million steps or more, just really good, in some university on high-speed computers to then release it opensource for voice adaptation?

Do you use scale_stats.npy from your upload or computed for your dataset?

From my upload. But this has been computed by @othiele on my dataset so it should be the same as computed by yourself.