If my understanding is correct, early_stop is used to prevent over-fitting the model. This seems like something you’d want to always have switched on. However, I noticed in the release notes that the pre-trained model has early_stop switched off. What was the reasoning behind this and in what situations is early_stop a bad thing?
When I am finetuning/can monitor the progress, I keep it switched off. If it’s going to be a fresh model which might be training for a long period, then maybe stop early.
**Because when you’re finetuning, there are times when you want to have higher learning rate to drop quick and then lower LR substantially to sustain the drop. It kinda acts like pseudo momentum. Also, if you get used to training models on similar dataset(size, partition and comprehensiveness) you get a knack of when to increase and when not to. The thing is, you cannot have high LR for a long time, because the loss starts oscillating. So at that point, when you’re monitoring, you reduce the LR and maintain the drop.
This is not an exact science, AFAIK. This is just what I follow.
P.S. This has yielded good results for me regardless of architecture/problem, from CV to NLP.