I think it would be a good idea to ask the user for his/her dialect or region of origin. For example in German the pronunciation differs a lot between the various regions. However, right now I can only specify if I come from Austria, Switzerland etc.
For German specifically there is already a discussion here: https://github.com/mozilla/voice-web/issues/1000
The discussions on github were closed and suggested to continue this brainstorming at discourse.
As author of this feature request, I like to share some thoughts on a language independent approach:
The differences can be spitted in spell/pronunciation and also custom words / terms in local areas. If we look at german language / “german dialects” there are categories by:
- neighbour countries: swiss-german, austria-german, …
- within germany:
- saxonian-german: e.g. “Leipzig” -> “Leiptsch”
- berlin-german: e.g. “ich” -> “icke”
- cologne-german: e.g. “Kölln” -> “Gölle”
- immigrant speech mixes: german-turkish, german-russian, german-polish, …
IMHO it is a important feature, to understand also this variants and the speech of minorities to avoid troubles in HCI or NLP. For example to avoid trouble by controlling an german GPS with an russian dialect
I suggest an two-step approach, to improve the coverage of our voice samples:
- Add Country-specific dialect options to profiles
- Extend Text-store with custom samples for this dialects and ask users with a matching profile.
This will allow us to identify the communities ASAP and to ask them for contribution later on.