How to train a model?

erogol · December 20, 2019, 11:45am

What you can alos do for small RAM GPUs, is to do gradient aggregation. It is not implemented in TTS but it is quite easy to do so. And it’d be a good PR as well.

To be more clear, you run your small batch of instances for n iterations and aggregate the gradients. After you reach N batches, you backprop the model.

shad94 · December 20, 2019, 12:22pm

shad94 · December 20, 2019, 12:27pm

Ok, so- to be sure that I understood it in a correct way- I use e.g. 8 size of batch, with (I donno, I got 1271 samples, so I wonder how many iterations I should take for such a small set ~1.5 h of audio) 32 000 iterations, doing aggregation (I assume, it must be a small script run somewhere?) , and then, if I want to reach batch of size 32, then I am doing it 4 times.

Edit: I ve found a nice model presenting aggregation, tough:
https://www.ijcai.org/proceedings/2019/0363.pdf (4-5p)

shad94 · December 23, 2019, 9:01am

And? What do you think about the dataset I prepared?

alchemi5t · December 23, 2019, 1:10pm

Hi shad,
I haven’t had much time to work on it. I’ll go through it as soon as I get some time.

shad94 · December 28, 2019, 7:39pm

I have a question considering my dataset- is 20kHz as sampling frequency is a MUST to train a model with test and validation datasets you put in TTS/tests? I am asking because mine have 16kHZ

shad94 · December 30, 2019, 9:20pm

friendly reminder

alchemi5t · December 31, 2019, 5:18am

why does the transcription have a pipe and then, i think, the same transcription again? and where is your config file you want me to use?

shad94 · December 31, 2019, 9:05am

I followed the rule based on the LJSpeech (original one): there was a transcript and then again, the transcript- because sometimes the numbers and shortenings appear, and they are developed in the second part, after the pipe.

config.json
in file best_model_config.json I changed only the sample rate to 16000

alchemi5t · December 31, 2019, 9:12am

ok. I’ll use the later half of the transcription then. so you want me to train the model using this config?

shad94 · December 31, 2019, 9:18am

Yes, this is config with the batch of size 32

alchemi5t · January 2, 2020, 5:05am

@shad94 I’ve trained a model with your config and data, but made slight changes to the config. using tacotron2, and training for 150 epochs. I’ve turned off eval and test so that you can have the model faster. I cannot vouch for the quality of the result, I hope you have tuned the config for the best fit. I’ll share the link as soon as it’s uploaded.

alchemi5t · January 2, 2020, 9:36am

Good luck with it. Your dataset was very small, you should probably test it on the training set itself.

also rename it to *.pth.tar . Had to rename to check if i could upload it onto discourse directly.

shad94 · January 2, 2020, 3:17pm

Thank you!
However, how to use PTH. file ? I would like to see results after that training, but I don’t know where exactly to load it. I know I could use it for further training, but at this moment maybe it would be a good idea to change some parameter(s).

alchemi5t · January 2, 2020, 3:22pm

Use the notebooks in the repo to evaluate it.

shad94 · January 2, 2020, 3:27pm

e.g., in ExtractTTSpectrogram I change that line, I believe (76):

shad94 · January 4, 2020, 1:44pm

ok, I am using Benchmark.ipynb, however, I wonder what shall be paths to those:

VOCODER_MODEL_PATH and VOCODER_CONFIG_PATH, because I got no checkpoint file nor config to that.

alchemi5t · January 6, 2020, 3:40pm

Just saw this. Use Griffin-Lim, not a neural vocoder.

shad94 · January 6, 2020, 4:30pm

So where should I apply changes and how? I think I didn’t get it…

alchemi5t · January 6, 2020, 4:54pm

Check the code shad. It’s been a long time since I’ve used it. There should be a use_gl parameter.