Thanks @reuben. Sorry if I missed any of those details that are already covered.
I had one more question on your initial training: Was the entire 3817 hours fed into the model and the 3 phase training done or was it broken down by dataset or some other fashion for the 3-phase training.
Does the final outcome of the model (WER) differ based on whether it was trained for the 3817 hours all in one shot or trained first on 960 hours of LS, then 1700 WAMU recordings and so (where we compute the WER using the final cumulative model). Hope I am making sense.
Asking this also for my own custom training standpoint- so if I have a 500 hour dataset, will WER vary when I train:
the entire 500 hours over the 0.7.0 released checkpoint and test
or
train an 100 hours over the 0.7.0 released checkpoint, then add the next 100 hours and so to get to the same 500 hours as in the prior case and then test
Also would you recommend using early stopping when fine tuning?
thank you very for taking the time to answer