Isn't sortagrad intended for first epoch only?

I thought the idea behind sortagrad curriculum learning is to apply utterance-duration sorting for the first epoch, then present utterances in random orders for subsequent epochs. Mozilla DeepSpeech seems to train on the same sorted utterance order for every epoch. Is there rationale for that?

We’ve experimented a bit with 0.4.1, and get some improvement when disabling the sort, for retraining the pretrained models with 300 hours of our speech. Further experiments await 0.5 where the DataSet usage is clearer and should be easier to work with.

Yes, we don’t have SortaGrad. The old rationale was mostly an engineering compromise, the feeding mechanism was completely decoupled from the main training loop so there wasn’t a clean way to conditional shuffling or any per-epoch logic, plus we were getting reasonable results without that change. With the new tf.data pipeline it should be much simpler to do conditional shuffling, although care is needed to see how it interacts with the caching mechanism.

OK. Looks like 0.5 will have the same behavior, with the unconditional sort in create_dataset(). We’ll experiment further (when you release it :slight_smile: ) and let you know what we find.