Is it possible to know how much of Fisher was used? Seems like that is conversational and most related to the kind of data that I am looking at. Was the entire Fisher data set used?
Is it possible get the training csv that you used?
Thanks @reuben. Sorry if I missed any of those details that are already covered.
I had one more question on your initial training: Was the entire 3817 hours fed into the model and the 3 phase training done or was it broken down by dataset or some other fashion for the 3-phase training.
Does the final outcome of the model (WER) differ based on whether it was trained for the 3817 hours all in one shot or trained first on 960 hours of LS, then 1700 WAMU recordings and so (where we compute the WER using the final cumulative model). Hope I am making sense.
Asking this also for my own custom training standpoint- so if I have a 500 hour dataset, will WER vary when I train:
the entire 500 hours over the 0.7.0 released checkpoint and test
or
train an 100 hours over the 0.7.0 released checkpoint, then add the next 100 hours and so to get to the same 500 hours as in the prior case and then test
Also would you recommend using early stopping when fine tuning?
Thank you very much for the response. So if I have 500 hours it would be a good idea to train in one shot save the model and checkpoints. If I get a new set of 1000 hours of similar type data, should I do the 500+1000 from the released 0.7.0 checkpoint or just do the new 1000 on my saved checkpoint? Is there a recommended approach for this?
Also should the 3 phase training process be followed for fine tuning as well?
The three phase training is simply a consequence of training until validation loss stops improving, then lowering the learning rate and continuing from the best validation loss checkpoint. It’s not necessarily something you’ll have to do for fine tuning, as the behavior of the training process changes with the dataset. You’ll have to figure out for yourself what works best on your dataset.
Thanks for the clarification.
Would you also be comment on this:
So if I have 500 hours it would be a good idea to train in one shot save the model and checkpoints. If I get a new set of 1000 hours of similar type data, should I do the 500+1000 from the released 0.7.0 checkpoint or just do the new 1000 on my saved checkpoint? Is there a recommended approach for this?