How to train a model?

Ok, so- to be sure that I understood it in a correct way- I use e.g. 8 size of batch, with (I donno, I got 1271 samples, so I wonder how many iterations I should take for such a small set ~1.5 h of audio) 32 000 iterations, doing aggregation (I assume, it must be a small script run somewhere?) , and then, if I want to reach batch of size 32, then I am doing it 4 times.

Edit: I ve found a nice model presenting aggregation, tough:
https://www.ijcai.org/proceedings/2019/0363.pdf (4-5p)

And? What do you think about the dataset I prepared?

Hi shad,
I haven’t had much time to work on it. I’ll go through it as soon as I get some time.

I have a question considering my dataset- is 20kHz as sampling frequency is a MUST to train a model with test and validation datasets you put in TTS/tests? I am asking because mine have 16kHZ

friendly reminder :slight_smile:

why does the transcription have a pipe and then, i think, the same transcription again? and where is your config file you want me to use?

I followed the rule based on the LJSpeech (original one): there was a transcript and then again, the transcript- because sometimes the numbers and shortenings appear, and they are developed in the second part, after the pipe.

config.json
in file best_model_config.json I changed only the sample rate to 16000

ok. I’ll use the later half of the transcription then. so you want me to train the model using this config?

Yes, this is config with the batch of size 32

@shad94 I’ve trained a model with your config and data, but made slight changes to the config. using tacotron2, and training for 150 epochs. I’ve turned off eval and test so that you can have the model faster. I cannot vouch for the quality of the result, I hope you have tuned the config for the best fit. I’ll share the link as soon as it’s uploaded.

Good luck with it. Your dataset was very small, you should probably test it on the training set itself.

also rename it to *.pth.tar . Had to rename to check if i could upload it onto discourse directly.

Thank you! :wink:
However, how to use PTH. file ? I would like to see results after that training, but I don’t know where exactly to load it. I know I could use it for further training, but at this moment maybe it would be a good idea to change some parameter(s).

Use the notebooks in the repo to evaluate it.

e.g., in ExtractTTSpectrogram I change that line, I believe (76):

ok, I am using Benchmark.ipynb, however, I wonder what shall be paths to those:


VOCODER_MODEL_PATH and VOCODER_CONFIG_PATH, because I got no checkpoint file nor config to that.

Just saw this. Use Griffin-Lim, not a neural vocoder.

So where should I apply changes and how? I think I didn’t get it…

Check the code shad. It’s been a long time since I’ve used it. There should be a use_gl parameter.

Yes, there is such a parameter, although I was asking about those in the frame…

Please read the code! It’s as straightforward as it gets. I can see what the issue is right here. You can’t be asking for help with these trivial issues.