My first language is New Zealand English. It has a common writing system with most other Englishes but also has some unique spelling/sound correspondences. These are due to the intermixing with NZs other common language, Maori.
There are many common words borrowed from Maori in New Zealand English. These words are written with the Maori spelling/sound correspondences. E.G âWhakataneâ is said fah-kah-TAH-nÉ, not wack-a-tain. Ngaruawahia = [ĆaËÉŸÊaËwaËhia] with the NG being a nasal sound like suNG âŠ
Macros can sometimes also be used for long vowels in MÄori. Macros are kept when writing in english.
To complicate things further⊠There are a wide range of real life ways that non-native and uneducated speakers of Maori or NZ english will pronounce such words. If you showed the prompt ânavigate to Whatataneâ to someone with no experience with New Zealand English, it would be hard for them to know how to pronounce it.
So both âcorrectâ and incorrect pronunciations are in common use. If someone wanted to make a voice controlled GPS app that could be used by international tourists, and Locals a like then it would be important to capture all these data points.
Google and Vodafone NZ have made a presumably private dataset described here.
https://news.vodafone.co.nz/article/new-zealanders-highlight-te-reo-maori-names-be-updated-google-maps
If you are making an app that is for transcription of text and it is being used by somone outside of NZ , you probably dont want Fah-Kah correspond to the letters Whaka. So this data set needs to be separatable from other Englishes.
So there are a plurality of Englishes around the world which share many but not all words. If each language has just one dataset, will the unique features of each country be left out, or all mixed together. Neither seem desirable. Or will there be a great duplication of words where there is overlap. Also not the best.
Say New Zealand English and Australian English are 95% similar in terms of words and grammar. New Zealand and British are 90% a like and New Zealand and American are also 90% a like but in different ways. Do we need to collect completely different sentence sets for NZ, AUS, UK, US?
If not, what does an American do with the prompt âRangatotoâ or a NZer with âArkansasâ for that matter.
When I talk with English speakers from Kenya or India, they have their own unique set of words and grammars too. These cannot simply be accounted for as accents or informal language.
This study has some good examples of the differences
http://archive.gameswithwords.org/WhichEnglish. ⊠It might need the way back machine to read now. âthe dog was chased the cat.â Etc.
(Also Does grammar even matter to an agent trained on sound files / text chunks?).
My current second language is Japanese. I am forever embarrassed by Amazon Alexaâs refusal to understand a word I say even when humans have no problem. I wouldnât consider myself near native, but I defiantly think my accent is influenced by the region I live in. This is the language community I participate in to become a Japanese speaker, so of course I pick up itâs habits.
As for ascents. I donât think defining by cities is a good idea. Firstly because about half the world doesnât live in one (yet). Rural people are already underserved by technology, I would hesitate to choose categories that by design make something less useful to them.
Secondly, because accent is more about language communities, maybe⊠There is difference based on, age, class, education level, ethnicity, also. Common description of English accents usually have a Cultivated variant, because people like to show how educated they are by changing up their vowels.
Accent is of course related to how different speakers move from graphemes (signs) to morphemes(mental) to phonemes (sounds) , this is segmental. There is also suprasegmental elements to accents, stress, intonation, prosody, pitch. These seem to be missing from this definition.
Sorry of thereâs mistakes here Iâm no expert. Also itâs hard to write on a phone.