There is research suggesting that a simple tweak like layer or instance normalization could improve model performance on unseen accents (accents not seen in training data). Given the amount of concern regarding DeepSpeech’s American English model not working well for other accents, I think it would be worthwhile to run an experiment to see if something as simple as a few normalization layers could help. I wouldn’t expect this simple fix to achieve parity for other accents compared to American English, but it might help close the gap while we wait for more data collection.
I am happy to help in any capacity with such an experiment. I do not have access to GPUs / hardware at the moment. If someone could point me in the direction of inexpensive hardware / cloud services, then I can run the experiments and share the results here. I’m also happy to mentor someone else in running experiments if they would like to give it a go.