I notice that my profile on the site has “Native language” and “Additional languages” fields, but I can’t seem to find this information in the datasets. The dataset .tsv files just have “accent”. Am I missing something, or would it be possible to include this data in a future release?
It would be useful, for example, to be able to download the French dataset and determine which speakers were non-natives and what is their native language. This information would be critical for doing automatic accent identification, and might also be useful for doing speech model adaptation as well as testing robustness on different accents. I see that there are plans to add a “native” field in the languages and accents strategy, but if the information is already in the database it would be useful to have.