i trained multiple models adding the data sequentially and WER and loss kept dropping, but when added more data and trained the model with 230 hours (162638 files) and the same configuration the loss and WER started increasing. is that data related ? or should i change the configuration? or what type of tests should i do?
not same distribution
if i trained new data from scratch it takes like 34 epochs
and if i trained starting from last best model with less data (starting from frozen model) about 15 epochs. in both cases early stop is triggered
When the new data is not from the same distribution, then “all bets are off”. The WER can go down, up, or stay the same.
For example say the first 140 hours were recording in a recording studio with basically no noise, but the additional 90 hours were recorded on a cheap microphone in a train station. One would then expect the WER to increase when adding the new data.