How and for what to use model checkpoints?

tina19 · November 4, 2019, 11:57am

Hi, i’m a newbie to deep learning, so i found no better place to ask some questions, in order to fully understand how to use the checkpoints.

I am currently building models as i add data to my dataset, since dataset size has been increasing over time. Every time i would add more examples, i would remove checkpoints from the checkpoint directory and start training from scratch.
My goal is that the model performs good on the whole dataset.
So my doubts are about:

What’s the proper use of checkpoints? Do I use them only when training is interrupted (not)manually, in order to proceed from where it stopped?
Should I instead resume training process from checkpoints every time, as i add more data? Also, does transfer learning stand for this?
If not, how can i perform transfer learning? Are there any specific steps to take?
If i would want transcription of data from different distribution in future, e.g. conversational speeches scenarios, is it better to do transfer learning or train a model from scratch?

Thanks in advance

lissyx · November 4, 2019, 1:41pm

Sorry, that’s not really a place to teach you deep learning in itself

That’s in fact the exact same use, restarting from a previous run. The reason why you restart is not important in itself

That really depends on what you want to achieve, but is you just add more data, it would make sense

It involves a bit more work to be done correctly, resettings layers etc.

There long-pending work that should hopefully land soon

tina19 · November 4, 2019, 2:03pm

I have tried both ways and the results were following:

When added more material and retrained model from scratch, test epoch ends with:
WER: 0.191577, loss: 69.710556
When added that material (same dataset) and retrained model from latest checkpoints, i get:
WER: 0.090943, loss: 47.997845

There is quite a difference in what i get for the same amount of material in the numbers, that’s why i want to make sure which steps are correct to follow.

As i said, what i want to achieve is that it works good on the complete dataset. So does training from checkpoints still make sense for this, or the case will be that the model only works great for the last batch of material i have added?

reuben · November 4, 2019, 2:09pm

It’s impossible for us to answer this for you, as it depends on the data, the hyperparameters, how long you train for before and after adding new data, etc. The correct approach is to have a good validation set that contains representative data that you want the model to perform well at, then whenever you’re trying different approaches you can use the validation set to test if things are going in the right direction.

tina19 · November 4, 2019, 2:16pm

thank you for clarifying things for me

Topic		Replies	Views
Checkpoints for Online Learning DeepSpeech	0	754	February 20, 2018
Training model for additional data using checkpoint DeepSpeech	0	948	February 6, 2019
Checkpoints and frozen model in fine tuning DeepSpeech	1	649	September 6, 2019
I just used fine-tuning on my data. How can I use my new checkpoint created for prediction? DeepSpeech	6	417	May 15, 2020
How to run a trained model? DeepSpeech	43	1346	June 17, 2020

How and for what to use model checkpoints?

Related topics