Hello! It’s a new(ish) year, weekly updates are back and we’re so excited to be bringing you all the Common Voice news:
From the community:
The Frisian Common Voice community has been doing incredible work, running a Ynsprek Maraton contribution over the past few months. They’ve contributed 80 hours of new speech and validated 13 hours! They ran a livestream to celebrate and you can catch the recording here: https://www.youtube.com/watch?v=tazIIYrKWH4
Open Data week is coming up in March, run by the Open Knowledge Foundation. There will be a series of conference days and events around the world including Open Data Day Karlsrule on March 4th https://ok-lab-karlsruhe.de/en/projekte/odd/, where community member Stefan Grotz will be speaking about the Common Voice project
Date and Time: 03-21, 15:00–16:00 (Europe/Amsterdam),
Join a panel of Common Voice community members as they tackle a call for greater diversity, inclusivity, and representation on the Internet to create a more equitable global digital community. Be part of the conversation as they share their lived experiences of the Anglocentric digital world. In addition to their ongoing efforts to dismantle Anglocentrism, we hope that the panel discussion will provide feasible solutions that will contribute to the creation of an inclusive digital community that accurately reflects the diverse linguistic and cultural communities across the globe.
Technologists and curious practitioners are welcomed to hear a talk about how governance considerations can be baked into every stage of building a speech recognition algorith - from data collection through to model testing. There will be several interactive scenarios for workshopping together.
That’s it for this week.
Did this update miss something important? Are you doing something cool we can help you show off? Reply to this thread, message Gina or myself, chat to us on Matrix or email commonvoice@mozilla.com to let us know about it, or just say hello!
It’s time for your weekly update from the Common Voice team! As always, if you’re doing something cool that we missed (or have something coming up we can show off for you) just reply and let us know!
Call for presenters: Are you an academic working in or with under-resourced languages? RESOURCEFUL 2023 is seeking submissions for speakers working in this space for their conference May 22-24th in Tórshavn, Faroe Islands. They’re looking for submissions of either 4-8 pages and you can learn more about the CFP and the conference itself here.
IRL in South Africa: Launch of “Voices of Mzansi” project workshop on behalf of GIZ’s Fair Forward project and Stellenbosch University. This project aims to localise the South African languages on Mozilla’s Common Voice platform and collect voice data that is open and accessible for everyone. The workshop will be at the CSIR International Convention Centre on 16 March 2023 @ 08.30. We’ll add more details as we find them!
Over at the Mozilla Corporation: The Mozilla community call on March 9th will be looking at how Firefox handles machine translations with privacy in mind. This might be of interest to those of you (most of you?) excited about languages and translation technologies. And the presenter, André Natal was involved in the early days of the Common Voice Project. Catch it when it goes live: https://www.youtube.com/watch?v=J06koBcfm5w
That’s it for this week.
Did this update miss something important? Are you doing something cool we can help you show off? Reply to this thread, message Gina or myself, chat to us on Matrix or email commonvoice@mozilla.com to let us know about it, or just say hello!
The Funder’s Track for Mozfest is also doing some exciting things, I would encourage you to check it out and share with your contacts if it’s of interest!
Speaking of presentations, @stergro recently spoke at an Open Data Day in Karlsruhe and wanted to share his thoughts (and slides!)
The Sentence Collector is changing
We’re bringing the Sentence Collector into the Common Voice platform more neatly in this upcoming release, more details can be found in this post
Dataset 13.0 is coming
We’re working on the release of the freshest Common Voice data in the new dataset. We’ll be updating you shortly (later today!) with more details
Did we miss something?
Reply to this thread or message Jess or Gina if we can be bragging about the cool things you’re doing with Common Voice!
It’s time for your weekly update from the Common Voice team!
MozFest Recap: Last week marked the occurrence of MozFest 2023, an event that stresses the importance of creating a fair and inclusive internet accessible to all. This year, the festival saw a participation of over 6,000 volunteers, facilitators, wranglers and active participants from various corners of the globe. The festival featured insightful discussions, interactive sessions on art, music and food, and meaningful exchanges of ideas among the attendees. We extend our gratitude to the members of the community who played vital roles as attendees and panelists at the MozFest. “The festival serves as a unique platform that brings together two seemingly different groups - tech experts and human rights activists” one of the speakers highlighted. At its core, MozFest is about people and community. We hope that you had the opportunity to be part of this incredible experience and enjoyed it. In case you were unable to attend the event and wish to revisit some of the sessions, you can visit the MozFest website to access recorded sessions.
Data Science for Health in Africa Virtual Networking Exchange on 3 May 2020: An opportunity to attend Virtual Networking Exchange on 3 May 2023 to meet and interact with organizations working on data science and health in Africa! During this completely online event, you will learn about exciting work happening across the continent, share information about your work, and identify potential collaborators. The Networking Exchange is completely free and open to any organization working on data science and/or health in Africa.
How to Participate: To participate, you may register as a participant or as a presenter.
Participants may join the event and move around between different Zoom rooms to learn about different data science and health organizations.
Presenters may represent their organization on the agenda for the event. They will have a Zoom breakout room for an allotted time during which they can share information about their organization and interact with participants.
More information on the Virtual Networking Exchange and registration is available on the DS-I Africa website
Note: The deadline to register as a presenter is 14 April 2023. There are a limited number of presenter slots available so please register as soon as possible.
Uzbek AI: The Uzbekvoice.ai team has collected about 1400 hours of high-quality audio recordings with accompanying texts to create a valuable resource for the Uzbek language. Common Voice recognizes, commends, and appreciates the team’s dedication to building open and accessible resources for language technology. Read more about it and get in contact with the team here: Uzbekvoice.ai Project.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message Jess or myself or chat to us on Matrix:)
Hello and welcome back to another weekly(ish) update from the Common Voice team!
Our Catalan community is going to be running a contribution campaign April 14th-16th. They’ve been doing just a stellar job and I bet they’re going to meet their ambitious goal of 3000 contributed hours! Come help them out or cheer them on.
@chenaichair joined BBC recently to talk about global access in AI about Common Voice. While (I think) the whole segment is interesting, you can skip to 7:18 to listen to her shine. You can listen here (Requires a BBC Sounds login)
Common Voice has been nominated for 2 Webby awards, for Accessible Technology and Responsible Innovation While votes for us do help get more visibility on the project, we’re just excited to be nominated!
We’ve also updated the sentence corpus for Catalan, Abkhaz, and Esperanto! Release notes here.
Did we miss something? Do you have something cool you want us to talk (or brag!) about? Reply here, message @gina or myself or chat to use on Matrix!
Welcome back to another weekly update from the Common Voice team !
Our Catalan community contribution campaign is still on-going until April 16th. Read more about the campaign here.
The voting period for the Webby Awards remains open until April 20th, and Common Voice has been nominated in the categories of Accessible Technology and Responsible Innovation. If you could spare some time to cast your vote for Common Voice, it would be greatly appreciated.
Lacuna Fund announced two new calls for proposals to develop open and accessible machine learning (ML) datasets that will improve Sexual, Reproductive & Maternal Health and Rights (SRMHR) and illuminate the relationship between Climate & Forests to help identify interventions that could mitigate or adapt to climate impacts. Read more and access the full requirements and submission portal here. Round 1 closing date is April 19th, Round 2 June 20th.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message @jesslynnrose or myself @gina or chat to us on Matrix:)
Welcome back to another weekly update from the Common Voice team!
Exciting News
We have new bulk sentences in last week’s update for Japanese and Swahili. Here’s the release.
Making the Latvian Language AI-Compatible Latvian Open Technology Association (LOTA) is taking the lead in a joint initiative to make the Latvian language work seamlessly with AI tools worldwide. LOTA is collaborating with partners to achieve this goal, and several activities are planned for the coming months.
The first two activities will happen around 4th of May and on 11th of May. The 4th of May is the day Latvia regained its independence and for this occasion, LOTA is planning a social media influencer campaign asking Latvians to record their voices on the Common Voice platform. On the 11th of May, LOTA will host an annual conference focusing on Open data where voice donations on Common Voice will also have a dedicated spot.
At a later stage, activities planned for this summer and autumn will be part of a research project by the Artificial Intelligence Laboratory at IMCS, University of Latvia. The project aims to gather Latvian voice donations on Common Voice.
Lota is open to collaborating with any other organizations or individuals from Latvia working or interested in contributing to Common Voice. For further information or enquiries, contact Raivis Dejus on raivis.dejus@gmail.com.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message jesslynnrose or myself Gina_Moape or chat to us on Matrix:)
Welcome back to another weekly update from the Common Voice team!
The New Common Voice Sentence Collector
New look, New features for the new Common Voice Sentence Collector. Read more about it Here.
Latvian Open Technology Association (LOTA) Initiative Success Latvian Open Technology Association (LOTA) has taken the lead in a joint initiative to make the Latvian language work seamlessly with AI tools worldwide. LOTA collaborated with partners to achieve this goal. The initial activity held on the 04th of May achieved great success. Their campaigns were featured in the top 2 news programs:
During the campaign, the team managed to attract almost 7 000 people and had about 400 people donating their voices on the Common Voice platform. They managed to increase the recordings from ~18 hours to ~88 hours. On the 11th of May, LOTA will host an annual conference focusing on Open Data where voice donations on Common Voice will also have a dedicated spot.
The French Geek Festival 2023
Geet Faeries 2023 will be held on the 03-04 June 2023 at Cahteau de Selles sur cher, 1 Le Château, 41130 Selles-sur-Cher, Selles sur cher, France. The event aims to bring together Tech enthusiasts. RSVP for event here.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message @jesslynnrose or myself @Gina_Moape or chat to us on Matrix:)
Welcome back to the weekly update from the Common Voice Community team. We’ve had a busy week with the new Sentence Collector update. Now you can write your own original sentences right in the main Common Voice UI. Anyone can write or submit new sentences but you’ll need to be logged in with an account to review new sentences. If you spot a bug or problem with the new Sentence Collector, could you let us know by raising an issue?
Right now you can only submit one qualifying sentence at a time using the Sentence Collector. So we’ve also created some new documentation showing you how to create bulk sentence submission to support the corpus of your favorite languages.
If you work with Voice and/or Speech data and have the time to take a short survey to help academics better understand your dataset documentation practices, @kath at ANU has a short survey open now that you can take.
If you’re a French speaker (or ever wanted to learn a bit more French!) Common Voice will be represented at at upcoming Geek Faëries(https://www.geekfaeries.fr/) festival June 2-4th in Loir-et-Cher. A rare chance to contribute to Common Voice in person, in cosplay!
That’s it for this week. Did we miss something? Got something cool coming up we can share with the community? Or you want us to brag about something amazing you’ve done? Just reply here, say hello on Matrix or email commonvoice@mozilla.com and we’ll joyously include you in the next update.
New Launched Locale
We are excited to announce the newly launched locale for Tamazight ‘zgh’. Common Voice continues to grow and we are grateful.
The French Geek Festival 2023
Geet Faeries 2023 will be held on the 03-04 June 2023 at Cahteau de Selles sur cher, 1 Le Château, 41130 Selles-sur-Cher, Selles sur cher, France. The event aims to bring together Tech enthusiasts. RSVP for event here
Voice Data Collection
We are currently working on a “How to” guide for contributors who wish to gather voice data in their specific languages, please email us or comment on this update if you have any recommendations, suggestions or input you would like us to include in the guideline.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message @Jess or myself @ginamoape or say hello on Matrix or email commonvoice@mozilla.com and we’ll joyously include you in the next update.
Hello and welcome to your weekly-ish update from the Common Voice team!
We’re still excited about the changes to the Common Voice sentence collector, if you see any weird bugs or unexpected behavior, could you let us know by raising an issue on Github?
The new Sentence Collector got put to work at a stunning event in DRC run by our Kiswahili community, focusing on writing, speaking and review contributions and enabling female contributors. So many excited thanks to fellow Rebecca Ryakitimbo Mwimbi for organizing this event in Lubumbashi.
Le Voice Lab hosted the MCV project to talk about innovation and inclusion in open source. Webinar video available in French, for speakers of French (or learners!)
Our wonderful @gina is going to be speaking at AfricaAI in Kigali next week, looking at the Common Voice project.
Are you doing something exciting and want us to help shout about it? Reply here, DM me (or @gina!) or email us and we’ll include you in next week’s update.
Thank you for the update and sharing the exciting news! It’s fantastic to hear about the newly launched locale for Tamazight ‘zgh’.
I’m definitely interested in attending the French Geek Festival 2023, Geet Faeries. The event sounds like a great opportunity to connect with fellow tech enthusiasts.
Lastly, I don’t have anything specific to share at the moment, but I’ll definitely keep it in mind if there’s something cool I’d like to showcase.