How and for what to use model checkpoints?

Hi, i’m a newbie to deep learning, so i found no better place to ask some questions, in order to fully understand how to use the checkpoints.

I am currently building models as i add data to my dataset, since dataset size has been increasing over time. Every time i would add more examples, i would remove checkpoints from the checkpoint directory and start training from scratch.
My goal is that the model performs good on the whole dataset.
So my doubts are about:

  • What’s the proper use of checkpoints? Do I use them only when training is interrupted (not)manually, in order to proceed from where it stopped?
  • Should I instead resume training process from checkpoints every time, as i add more data? Also, does transfer learning stand for this?
  • If not, how can i perform transfer learning? Are there any specific steps to take?
  • If i would want transcription of data from different distribution in future, e.g. conversational speeches scenarios, is it better to do transfer learning or train a model from scratch?

Thanks in advance :slight_smile:

Sorry, that’s not really a place to teach you deep learning in itself

That’s in fact the exact same use, restarting from a previous run. The reason why you restart is not important in itself

That really depends on what you want to achieve, but is you just add more data, it would make sense

It involves a bit more work to be done correctly, resettings layers etc.

There long-pending work that should hopefully land soon

I have tried both ways and the results were following:

  • When added more material and retrained model from scratch, test epoch ends with:
    WER: 0.191577, loss: 69.710556

  • When added that material (same dataset) and retrained model from latest checkpoints, i get:
    WER: 0.090943, loss: 47.997845

There is quite a difference in what i get for the same amount of material in the numbers, that’s why i want to make sure which steps are correct to follow.

As i said, what i want to achieve is that it works good on the complete dataset. So does training from checkpoints still make sense for this, or the case will be that the model only works great for the last batch of material i have added?

It’s impossible for us to answer this for you, as it depends on the data, the hyperparameters, how long you train for before and after adding new data, etc. The correct approach is to have a good validation set that contains representative data that you want the model to perform well at, then whenever you’re trying different approaches you can use the validation set to test if things are going in the right direction.

thank you for clarifying things for me :slight_smile: