Model doesnt train second time from the checkpoint

@sanjay.pandey Yes that’d be my guess too, you need to find the “sweet spot”.

I’d treat the number of additional epochs as a hyperparameter and try and tune the hyperparameter so as to get the “optimal” performance on both data sets.

As to what “optimal” means it’s up to you. I’d guess you’d have to create a loss which weighs both the 64k and 600 losses in a way that reflects your ultimate use case then use that loss on a dev set, which would include 64k and 600 data, to decide what the optimal value of the number of additional epochs is.