Thanks Jess, great to have weekly updates back
It’s time for your weekly update from the Common Voice team! As always, if you’re doing something cool that we missed (or have something coming up we can show off for you) just reply and let us know!
Tomorrow, online: Our East African language communities have been doing incredible work, with a focus on Kinyarwanda, Kiswahili and Luganda. You can come learn about their successes at 4pm EAT by signing up for Creating community-driven datasets: Insights from Mozilla Common Voice activities in East Africa.
Call for presenters: Are you an academic working in or with under-resourced languages? RESOURCEFUL 2023 is seeking submissions for speakers working in this space for their conference May 22-24th in Tórshavn, Faroe Islands. They’re looking for submissions of either 4-8 pages and you can learn more about the CFP and the conference itself here.
IRL in South Africa: Launch of “Voices of Mzansi” project workshop on behalf of GIZ’s Fair Forward project and Stellenbosch University. This project aims to localise the South African languages on Mozilla’s Common Voice platform and collect voice data that is open and accessible for everyone. The workshop will be at the CSIR International Convention Centre on 16 March 2023 @ 08.30. We’ll add more details as we find them!
Over at the Mozilla Corporation: The Mozilla community call on March 9th will be looking at how Firefox handles machine translations with privacy in mind. This might be of interest to those of you (most of you?) excited about languages and translation technologies. And the presenter, André Natal was involved in the early days of the Common Voice Project. Catch it when it goes live: https://www.youtube.com/watch?v=J06koBcfm5w
That’s it for this week.
Did this update miss something important? Are you doing something cool we can help you show off? Reply to this thread, message Gina or myself, chat to us on Matrix or email commonvoice@mozilla.com to let us know about it, or just say hello!
It’s time for another weekly update!
Common Voice at Mozfest
Want a refresher on the basics of what Common Voice is? Here’s an overview of the project focused on our Tamil language community!
@gina is running a really exciting session on Breaking the anglocentricism of the internet - perspectives from the Common Voice community
And I’ll be putting together a holistic look ast the governance and design principles behind each stage of the Speech AI lifecycle: Governance & Algorithmic Design in Speech AI - with Mozilla Common Voice and NVIDIA
The Funder’s Track for Mozfest is also doing some exciting things, I would encourage you to check it out and share with your contacts if it’s of interest!
Speaking of presentations, @stergro recently spoke at an Open Data Day in Karlsruhe and wanted to share his thoughts (and slides!)
The Sentence Collector is changing
We’re bringing the Sentence Collector into the Common Voice platform more neatly in this upcoming release, more details can be found in this post
Dataset 13.0 is coming
We’re working on the release of the freshest Common Voice data in the new dataset. We’ll be updating you shortly (later today!) with more details
Did we miss something?
Reply to this thread or message Jess or Gina if we can be bragging about the cool things you’re doing with Common Voice!
Thanks for the update and your mention of my slides! Maybe its worth pinning this thread on top of the forum like the old one?
Quite ironic how the talk about anglocentrism is presented haha:
You stole part of my speech
It’s time for your weekly update from the Common Voice team!
MozFest Recap: Last week marked the occurrence of MozFest 2023, an event that stresses the importance of creating a fair and inclusive internet accessible to all. This year, the festival saw a participation of over 6,000 volunteers, facilitators, wranglers and active participants from various corners of the globe. The festival featured insightful discussions, interactive sessions on art, music and food, and meaningful exchanges of ideas among the attendees. We extend our gratitude to the members of the community who played vital roles as attendees and panelists at the MozFest. “The festival serves as a unique platform that brings together two seemingly different groups - tech experts and human rights activists” one of the speakers highlighted. At its core, MozFest is about people and community. We hope that you had the opportunity to be part of this incredible experience and enjoyed it. In case you were unable to attend the event and wish to revisit some of the sessions, you can visit the MozFest website to access recorded sessions.
Data Science for Health in Africa Virtual Networking Exchange on 3 May 2020: An opportunity to attend Virtual Networking Exchange on 3 May 2023 to meet and interact with organizations working on data science and health in Africa! During this completely online event, you will learn about exciting work happening across the continent, share information about your work, and identify potential collaborators. The Networking Exchange is completely free and open to any organization working on data science and/or health in Africa.
How to Participate: To participate, you may register as a participant or as a presenter.
- Participants may join the event and move around between different Zoom rooms to learn about different data science and health organizations.
- Presenters may represent their organization on the agenda for the event. They will have a Zoom breakout room for an allotted time during which they can share information about their organization and interact with participants.
More information on the Virtual Networking Exchange and registration is available on the DS-I Africa website
Note: The deadline to register as a presenter is 14 April 2023. There are a limited number of presenter slots available so please register as soon as possible.
Uzbek AI: The Uzbekvoice.ai team has collected about 1400 hours of high-quality audio recordings with accompanying texts to create a valuable resource for the Uzbek language. Common Voice recognizes, commends, and appreciates the team’s dedication to building open and accessible resources for language technology. Read more about it and get in contact with the team here: Uzbekvoice.ai Project.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message Jess or myself or chat to us on Matrix:)
Hello and welcome back to another weekly(ish) update from the Common Voice team!
Our Catalan community is going to be running a contribution campaign April 14th-16th. They’ve been doing just a stellar job and I bet they’re going to meet their ambitious goal of 3000 contributed hours! Come help them out or cheer them on.
@chenaichair joined BBC recently to talk about global access in AI about Common Voice. While (I think) the whole segment is interesting, you can skip to 7:18 to listen to her shine. You can listen here (Requires a BBC Sounds login)
Common Voice has been nominated for 2 Webby awards, for Accessible Technology and Responsible Innovation While votes for us do help get more visibility on the project, we’re just excited to be nominated!
We’ve also updated the sentence corpus for Catalan, Abkhaz, and Esperanto! Release notes here.
Did we miss something? Do you have something cool you want us to talk (or brag!) about? Reply here, message @gina or myself or chat to use on Matrix!
Voted… But it requires registration…
Hi everyone
Welcome back to another weekly update from the Common Voice team !
Our Catalan community contribution campaign is still on-going until April 16th. Read more about the campaign here.
The voting period for the Webby Awards remains open until April 20th, and Common Voice has been nominated in the categories of Accessible Technology and Responsible Innovation. If you could spare some time to cast your vote for Common Voice, it would be greatly appreciated.
Lacuna Fund announced two new calls for proposals to develop open and accessible machine learning (ML) datasets that will improve Sexual, Reproductive & Maternal Health and Rights (SRMHR) and illuminate the relationship between Climate & Forests to help identify interventions that could mitigate or adapt to climate impacts. Read more and access the full requirements and submission portal here. Round 1 closing date is April 19th, Round 2 June 20th.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message @jesslynnrose or myself @gina or chat to us on Matrix:)
Hi everyone
Welcome back to another weekly update from the Common Voice team!
Exciting News
We have new bulk sentences in last week’s update for Japanese and Swahili. Here’s the release.
Making the Latvian Language AI-Compatible
Latvian Open Technology Association (LOTA) is taking the lead in a joint initiative to make the Latvian language work seamlessly with AI tools worldwide. LOTA is collaborating with partners to achieve this goal, and several activities are planned for the coming months.
The first two activities will happen around 4th of May and on 11th of May. The 4th of May is the day Latvia regained its independence and for this occasion, LOTA is planning a social media influencer campaign asking Latvians to record their voices on the Common Voice platform. On the 11th of May, LOTA will host an annual conference focusing on Open data where voice donations on Common Voice will also have a dedicated spot.
At a later stage, activities planned for this summer and autumn will be part of a research project by the Artificial Intelligence Laboratory at IMCS, University of Latvia. The project aims to gather Latvian voice donations on Common Voice.
Lota is open to collaborating with any other organizations or individuals from Latvia working or interested in contributing to Common Voice. For further information or enquiries, contact Raivis Dejus on raivis.dejus@gmail.com.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message jesslynnrose or myself Gina_Moape or chat to us on Matrix:)
Hi everyone
Welcome back to another weekly update from the Common Voice team!
The New Common Voice Sentence Collector
New look, New features for the new Common Voice Sentence Collector. Read more about it Here.
Latvian Open Technology Association (LOTA) Initiative Success
Latvian Open Technology Association (LOTA) has taken the lead in a joint initiative to make the Latvian language work seamlessly with AI tools worldwide. LOTA collaborated with partners to achieve this goal. The initial activity held on the 04th of May achieved great success. Their campaigns were featured in the top 2 news programs:
During the campaign, the team managed to attract almost 7 000 people and had about 400 people donating their voices on the Common Voice platform. They managed to increase the recordings from ~18 hours to ~88 hours. On the 11th of May, LOTA will host an annual conference focusing on Open Data where voice donations on Common Voice will also have a dedicated spot.
The French Geek Festival 2023
Geet Faeries 2023 will be held on the 03-04 June 2023 at Cahteau de Selles sur cher, 1 Le Château, 41130 Selles-sur-Cher, Selles sur cher, France. The event aims to bring together Tech enthusiasts. RSVP for event here.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message @jesslynnrose or myself @Gina_Moape or chat to us on Matrix:)
Great news! When will the new sentence collector be online?
Going live right now!
Welcome back to the weekly update from the Common Voice Community team. We’ve had a busy week with the new Sentence Collector update. Now you can write your own original sentences right in the main Common Voice UI. Anyone can write or submit new sentences but you’ll need to be logged in with an account to review new sentences. If you spot a bug or problem with the new Sentence Collector, could you let us know by raising an issue?
Right now you can only submit one qualifying sentence at a time using the Sentence Collector. So we’ve also created some new documentation showing you how to create bulk sentence submission to support the corpus of your favorite languages.
If you work with Voice and/or Speech data and have the time to take a short survey to help academics better understand your dataset documentation practices, @kath at ANU has a short survey open now that you can take.
If you’re a French speaker (or ever wanted to learn a bit more French!) Common Voice will be represented at at upcoming Geek Faëries(https://www.geekfaeries.fr/) festival June 2-4th in Loir-et-Cher. A rare chance to contribute to Common Voice in person, in cosplay!
That’s it for this week. Did we miss something? Got something cool coming up we can share with the community? Or you want us to brag about something amazing you’ve done? Just reply here, say hello on Matrix or email commonvoice@mozilla.com and we’ll joyously include you in the next update.
Best,
Jess
Hi Everyone
New Week New Update
New Launched Locale
We are excited to announce the newly launched locale for Tamazight ‘zgh’. Common Voice continues to grow and we are grateful.
The French Geek Festival 2023
Geet Faeries 2023 will be held on the 03-04 June 2023 at Cahteau de Selles sur cher, 1 Le Château, 41130 Selles-sur-Cher, Selles sur cher, France. The event aims to bring together Tech enthusiasts. RSVP for event here
Voice Data Collection
We are currently working on a “How to” guide for contributors who wish to gather voice data in their specific languages, please email us or comment on this update if you have any recommendations, suggestions or input you would like us to include in the guideline.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message @Jess or myself @ginamoape or say hello on Matrix or email commonvoice@mozilla.com and we’ll joyously include you in the next update.
Hello and welcome to your weekly-ish update from the Common Voice team!
We’re still excited about the changes to the Common Voice sentence collector, if you see any weird bugs or unexpected behavior, could you let us know by raising an issue on Github?
The new Sentence Collector got put to work at a stunning event in DRC run by our Kiswahili community, focusing on writing, speaking and review contributions and enabling female contributors. So many excited thanks to fellow Rebecca Ryakitimbo Mwimbi for organizing this event in Lubumbashi.
Le Voice Lab hosted the MCV project to talk about innovation and inclusion in open source. Webinar video available in French, for speakers of French (or learners!)
Our wonderful @gina is going to be speaking at AfricaAI in Kigali next week, looking at the Common Voice project.
Are you doing something exciting and want us to help shout about it? Reply here, DM me (or @gina!) or email us and we’ll include you in next week’s update.
Thank you for the update and sharing the exciting news! It’s fantastic to hear about the newly launched locale for Tamazight ‘zgh’.
I’m definitely interested in attending the French Geek Festival 2023, Geet Faeries. The event sounds like a great opportunity to connect with fellow tech enthusiasts.
Lastly, I don’t have anything specific to share at the moment, but I’ll definitely keep it in mind if there’s something cool I’d like to showcase.
Hi everyone
Welcome back to the weekly update from the Common Voice Community team.
Common Voice at Africa AI Conference
Common Voice conducted a session at the Africa AI Conference titled: Africa Mradi - Using Mozilla Common voice to collect training data for different use cases. The interactive session took participants through a practical guide utilizing live examples of use cases developed through the use of the Common Voice datasets leveraging from the use cases developed through the africa Mradi work in Kiswahili and Kinyarwanda.The session covered:
- Building AI models for under-resourced languages
- Mozilla common voice platform
- How to build training data using Mozilla common voice platform
SoGood 2023 – 8th Workshop on Data Science for Social Good
The possibilities of Data Science for contributing to social, common, or public good are often not sufficiently perceived by the public at large. Data Science applications are already helping in serving people at the bottom of the economic pyramid, aiding people with special needs, helping international cooperation, and dealing with environmental problems, disasters, and climate change. In regular conferences and journals, papers on these topics are often scattered among sessions with names that hide their common nature (such as “Social networks”, “Predictive models” or the catch-all term “Applications”). Additionally, such forums tend to have a strong bias for papers that are novel in the strictly technical sense (new algorithms, new kinds of data analysis, new technologies) rather than novel in terms of social impact of the application.
If you are interested in this workshop, checkout Workshop, affiliated with ECML-PKDD 2023, 18-22 September 2023, Torino, Italy.
Voice Data Collection
We are currently working on a “How to” guide for contributors who wish to gather voice data in their specific languages, please email us or comment on this update if you have any recommendations, suggestions or any input you would like us to include in the guideline.
Did this update miss something important? Are you doing something cool that we can help you show off? Reply to this thread, message @ or @gina_moape or say hello on Matrix or email commonvoice@mozilla.com and we’ll joyously include you in the next update.