Creating Open and Accessible Resources for Uzbek: The Uzbekvoice.ai Project

gina · March 27, 2023, 9:24am

As we all know, language technology is rapidly advancing, and the need for diverse and accessible datasets is increasingly growing. However, data resources are scarce, specifically for languages with smaller populations. Uzbek is one such language, spoken by approximately 33 million people. The Uzbekvoice.ai team recognized the importance of developing an open-source text and voice dataset for the Uzbek language in an effort of creating a more equitable and inclusive future for voice tech. To date, they have collected about 1400 hours of high-quality audio with accompanying texts, all of which are publicly available and hosted on Google Drive for convenient accessibility.

Creating such as dataset requires a significant investment of time, resources, and expertise. The team behind Uzbekvoice.ai has put in a lot of hard work and dedication to creating this valuable resource for their community. Availing of this dataset will provide the NLP community, researchers, and developers with the data resources needed to create speech recognition and natural language processing applications for Uzbek. The work of Uzbekvoice.ai is a testament to the power of collaboration and community. By creating an open, accessible resource, the team is helping to create a more equitable and inclusive future for all. When we work together and share our resources, we can overcome the barriers of geography, language, and culture to build a better world.

Common Voice recognizes and commends the outstanding efforts of the Uzbekvoice.ai team in collecting and sharing a large dataset of text and audio recordings for the Uzbek language.

To connect with the team, contact Mukhammad Amin Kodirov on kodirov1002@gmail.com

Topic		Replies	Views
Uzbek language dataset contribution O’zbek (uz) contribution	0	958	February 16, 2023
Building Urdu Common Voice Dataset Common Voice l10n	1	1657	March 5, 2019
How to add sentences and recordings. Kyrgyz. 10000 samples Common Voice sentence-collection , dataset	5	964	March 26, 2020
Multi-language Dataset Beta Release Common Voice announcements , dataset	23	5813	April 6, 2020
Up-to-date dataset download O’zbek (uz)	1	1024	November 30, 2021

Creating Open and Accessible Resources for Uzbek: The Uzbekvoice.ai Project

Related topics