How to train a model?

Hi shad,
I haven’t had much time to work on it. I’ll go through it as soon as I get some time.

I have a question considering my dataset- is 20kHz as sampling frequency is a MUST to train a model with test and validation datasets you put in TTS/tests? I am asking because mine have 16kHZ

friendly reminder :slight_smile:

why does the transcription have a pipe and then, i think, the same transcription again? and where is your config file you want me to use?

I followed the rule based on the LJSpeech (original one): there was a transcript and then again, the transcript- because sometimes the numbers and shortenings appear, and they are developed in the second part, after the pipe.

config.json
in file best_model_config.json I changed only the sample rate to 16000

ok. I’ll use the later half of the transcription then. so you want me to train the model using this config?

Yes, this is config with the batch of size 32

@shad94 I’ve trained a model with your config and data, but made slight changes to the config. using tacotron2, and training for 150 epochs. I’ve turned off eval and test so that you can have the model faster. I cannot vouch for the quality of the result, I hope you have tuned the config for the best fit. I’ll share the link as soon as it’s uploaded.

Good luck with it. Your dataset was very small, you should probably test it on the training set itself.

also rename it to *.pth.tar . Had to rename to check if i could upload it onto discourse directly.

Thank you! :wink:
However, how to use PTH. file ? I would like to see results after that training, but I don’t know where exactly to load it. I know I could use it for further training, but at this moment maybe it would be a good idea to change some parameter(s).

Use the notebooks in the repo to evaluate it.

e.g., in ExtractTTSpectrogram I change that line, I believe (76):

ok, I am using Benchmark.ipynb, however, I wonder what shall be paths to those:


VOCODER_MODEL_PATH and VOCODER_CONFIG_PATH, because I got no checkpoint file nor config to that.

Just saw this. Use Griffin-Lim, not a neural vocoder.

So where should I apply changes and how? I think I didn’t get it…

Check the code shad. It’s been a long time since I’ve used it. There should be a use_gl parameter.

Yes, there is such a parameter, although I was asking about those in the frame…

Please read the code! It’s as straightforward as it gets. I can see what the issue is right here. You can’t be asking for help with these trivial issues.

Ok, nevermind; yep, it was easy :slight_smile:

After loading TTS model I received such an error:


I am using this released model, however, I wonder, where I should do such an adjustment in the shape of current model?

You probably have used another config with tacotron as model maybe? Check and let me know.