Dataset Release Day V.8

Dear Common Voice Community,

Thank you so much for your contribution and support in the creation of Common Voice Dataset V.8. Your creativity and ingenuity has made this dataset possible :hearts:.

On behalf of the Product Team, THANK YOU !

Dataset Stats

The dataset has grown by 30% and reaches 87 languages !

New languages in Common Voice 8 include Igbo, Marathi, Danish, Norwegian Nynorsk, Central Kurdish, Malayalam, Swahili, Erzya, Moksha, Macedonian and Santali (Ol Chiki).

You can download the Common Voice dataset here for free. The Dataset metadata is now published

Are you developing with the Common Voice Dataset ?

Community Resources Update !

We recently created new graphics and content to support Communities. Including a draft onboarding slide deck. You can access the resources via the google drive. If you have any access issues please let me know !

Here are a few examples fo the new graphics !


The Belarusian dataset has decreased by more than 100 hours compared to the state on January 10. What is the reason for this?

1 Like

Hey Andre,

Thanks for raising your questions.

Our team is currently investiagting the issue. We hope to respond as soon as possible. Please note that the leaderboard is an estimation of hours contributed.I will follow up with you for a more comprehensive report regarding your query.

Sorry for any inconvencies caused.