Simple tweak to improve DeepSpeech on accents

There is research suggesting that a simple tweak like layer or instance normalization could improve model performance on unseen accents (accents not seen in training data). Given the amount of concern regarding DeepSpeech’s American English model not working well for other accents, I think it would be worthwhile to run an experiment to see if something as simple as a few normalization layers could help. I wouldn’t expect this simple fix to achieve parity for other accents compared to American English, but it might help close the gap while we wait for more data collection.

I am happy to help in any capacity with such an experiment. I do not have access to GPUs / hardware at the moment. If someone could point me in the direction of inexpensive hardware / cloud services, then I can run the experiments and share the results here. I’m also happy to mentor someone else in running experiments if they would like to give it a go.

This has indeed been a problem for some time. Great suggestion.

The code will be moved to TF 2 soonish, so you might want to wait a bit before you do major changes. I know that GPU on Colab is not much, but it should work for testing and experimenting on a larger scale shouldn’t be that much of a problem. This is something many people are really interested in. Me included :slight_smile:

1 Like

Thanks for the tips @othiele! I’ll take a look at Colab.