I thought the idea behind sortagrad curriculum learning is to apply utterance-duration sorting for the first epoch, then present utterances in random orders for subsequent epochs. Mozilla DeepSpeech seems to train on the same sorted utterance order for every epoch. Is there rationale for that?
We’ve experimented a bit with 0.4.1, and get some improvement when disabling the sort, for retraining the pretrained models with 300 hours of our speech. Further experiments await 0.5 where the DataSet usage is clearer and should be easier to work with.