Missing locale info in the tsv files

minstrangeland · February 21, 2023, 10:58am

Hi all!

First of all, thank you very much for your wonderful work. I’ve been using the CV datasets for years, and they’ve been really helpful for many languages and tasks.

Perhaps someone already replied to this question, but I couldn’t find the answer. I’ve seen that in the 12.0 release the “locale” information is missing from the tsv files. Is this something intentional, or just a mistake? For me it was very useful to be able to differentiate the origin of the speakers.

Thank you very much in advance, and congratulations on the good work again!

Fran.

bozden · February 21, 2023, 6:17pm

If you mean the “accent” data, here is the related bug report:

Otherwise, the “locale” field does exist in the tsv files of several languages I just checked.

minstrangeland · February 22, 2023, 6:06am

Hi Bülent!

Thank you very much for the quick response, and for pointing me to the right place. I meant indeed the “accent” data. My bad.

Have a nice day,

Fran.

kathyreid · February 22, 2023, 10:08pm

Thanks @bozden for sharing the GitHub Issue link - @jesslynnrose is very kindly looking into this for me as well - and @minstrangeland it’s so good to know others are interested in this data too

@minstrangeland I have a GitHub repo you may be interested in - it’s a Jupyter notebook of heuristics for working with English accent data - it helps to group and relate the accents. It may be useful for your work, too.

minstrangeland · February 23, 2023, 6:01am

@kathyreid . Thanks for the link. I’ll take a look at it

jesslynnrose · February 27, 2023, 9:48am

Hello! Sorry about the late reply.

This is indeed a bug impacting multiple languages and our engineers are working to try to have this fixed in upcoming releases.

My apologies for the inconvenience but I massively appreciate you raising the issue and letting us know!

minstrangeland · February 27, 2023, 10:00am

Hi @jesslynnrose!

Thanks for letting me know, and no apologies about the delay. You were pretty fast.

Have a nice day!

Fran.