How to train a model?

Oh! Batch_size of 6 isn’t going to get you anywhere. The config has a comment saying anything less than 32 has a hard time converging. Also, your dataset is key to training a tts. Needs to be clean and consistent.

As you might remember, I cannot do anything with such a batch, because it is too large to my computer. I can share with you my dataset, which (I believe) I prepared well. (pass is alchemist; it will be ‘alive’ for 7 days)
TTS dataset in Polish

Btw, I got 4GB of GPU only on my computer.

I’ve tried downloading it but the download refuses to start. I’ve tried on different networks and browsers, no luck anywhere.

What you can alos do for small RAM GPUs, is to do gradient aggregation. It is not implemented in TTS but it is quite easy to do so. And it’d be a good PR as well.

To be more clear, you run your small batch of instances for n iterations and aggregate the gradients. After you reach N batches, you backprop the model.

1 Like

Ok, so- to be sure that I understood it in a correct way- I use e.g. 8 size of batch, with (I donno, I got 1271 samples, so I wonder how many iterations I should take for such a small set ~1.5 h of audio) 32 000 iterations, doing aggregation (I assume, it must be a small script run somewhere?) , and then, if I want to reach batch of size 32, then I am doing it 4 times.

Edit: I ve found a nice model presenting aggregation, tough:
https://www.ijcai.org/proceedings/2019/0363.pdf (4-5p)

And? What do you think about the dataset I prepared?

Hi shad,
I haven’t had much time to work on it. I’ll go through it as soon as I get some time.

I have a question considering my dataset- is 20kHz as sampling frequency is a MUST to train a model with test and validation datasets you put in TTS/tests? I am asking because mine have 16kHZ

friendly reminder :slight_smile:

why does the transcription have a pipe and then, i think, the same transcription again? and where is your config file you want me to use?

I followed the rule based on the LJSpeech (original one): there was a transcript and then again, the transcript- because sometimes the numbers and shortenings appear, and they are developed in the second part, after the pipe.

config.json
in file best_model_config.json I changed only the sample rate to 16000

ok. I’ll use the later half of the transcription then. so you want me to train the model using this config?

Yes, this is config with the batch of size 32

@shad94 I’ve trained a model with your config and data, but made slight changes to the config. using tacotron2, and training for 150 epochs. I’ve turned off eval and test so that you can have the model faster. I cannot vouch for the quality of the result, I hope you have tuned the config for the best fit. I’ll share the link as soon as it’s uploaded.

Good luck with it. Your dataset was very small, you should probably test it on the training set itself.

also rename it to *.pth.tar . Had to rename to check if i could upload it onto discourse directly.

Thank you! :wink:
However, how to use PTH. file ? I would like to see results after that training, but I don’t know where exactly to load it. I know I could use it for further training, but at this moment maybe it would be a good idea to change some parameter(s).

Use the notebooks in the repo to evaluate it.

e.g., in ExtractTTSpectrogram I change that line, I believe (76):

ok, I am using Benchmark.ipynb, however, I wonder what shall be paths to those:


VOCODER_MODEL_PATH and VOCODER_CONFIG_PATH, because I got no checkpoint file nor config to that.