Unable to load steps from checkpoint when training model

Yep I’m interested in this but I’m relatively new in this aspect what would I need to do?

First, familiarize yourself with TensorFlow 1.x checkpoint saving and loading logic. Then, familiarize yourself a bit with TensorFlow tf.data.Dataset APIs. Then, read and understand our feeding code. Start from function create_dataset in feeding.py, and go from there.

You’d have to add, as a start:

  • Code to the CSV and SDB loading classes to skip to an index in the input file when loading
  • Code to differentiate the first epoch from subsequent epochs, since you only want to skip on the first epoch when resuming
  • Code to save and load the last sample index that was loaded during training, so that it can be used for resuming
1 Like

Ok thanks will check it out!!