In the release notes of DeepSpeech 0.5.1, it is mentioned that the model was trained for 467356 steps or 75 epochs. Going by the batch training batch size of 24
Did the model achieve a word error rate of 8.22% by training only on 149544 audio samples? And what was the maximum audio length (in seconds) considered while processing the dataset?