Hello everyone, A lot of people have been asking us through Discourse and other mediums like Slack about when Common Voice will be available in their language. Well, this update is for all of you! First, the big news: we are aiming to launch multi-language Common Voice by the week of May 7. Howev…

A couple of quick questions – can Wikipedia content be used? The reason I am asking this is because for many languages, unlike English and other widely spoken languages, finding Public Domain content is almost impossible for various reasons. My own language Odia, and many languages from India do not…

Hi Subha, Thanks for asking this question. I’m very sad to say that under the current license structure we cannot use wikipedia content. This this comment on github (from a wikimedia lawyer) for context: https://github.com/mozilla/voice-web/issues/302#issuecomment-368561834 To answer your questio…

Can you please list what all Indian languages will be available from May 7th, so that we can be prepared. Thanks Ranjith Raj

Are manually sourced sentences still an option like it was for English? I have already gathered quite some Dutch sentences (all written by myself) in Github issues ( https://github.com/mozilla/voice-web/issues/213 ) and have some more stored locally. Where can we best dump this information? Are we su…

Hi @jef.daniels , that’s awesome! Those sentences are very much an option, as long as they are public domain (CC-0). I’ll start working on multi-language contribution this week and it looks like it’ll start the same way we’re doing it for english. I.e. sentences stored in (possibly multiple) txt-file…

For German, the protocols of the parliament/Bundestag might be sufficient They are provided here https://offenesparlament.de/ under CC-0 license.

If I scroll to the bottom of that page, it tells me that the content is under CC 3.0 Attribution and not CC-0 :face_with_raised_eyebrow: Am I missing something? :grin:

I think this refers to the website itself, not the raw parliamentary protocols data. The relevant text is „Alle Daten sind unter CC0 als Open Data frei zugänglich und können hier heruntergeladen werden“ on their data page https://offenesparlament.de/daten/

Wikipedia might not be a good choice but we can definitely use many public domains books and other content from Wikisource for local languages. Example: CC0 license book in Telugu https://te.wikisource.org/wiki/కుటుంబ_నియంత్రణ_పద్ధతులు

[image] r_TlfPuogHW2i9hdoQY1tGww: Wikipedia might not be a good choice but we can definitely use many public domains books and other content from Wikisource for local languages. Yes, maybe we need a script to scrap from various languages there. The only issue I found is the language is in gen…

Multi-language Update for Common Voice

Common Voice

mhenretty (Michael Henretty) June 28, 2018, 10:11am 17

Yes, we haven’t released any data for languages other than English. See this topic:

Topic		Replies	Views
📖 Readme: How to see my language on Common Voice Common Voice announcements	35	14426	May 10, 2022
Languages addressed Common Voice	24	3894	May 15, 2018
Question about multi-language support Common Voice	7	1079	May 9, 2018
How can I send sentences to contribute? Common Voice sentence-collection	7	2019	September 5, 2018
Multi-language common voice Common Voice	1	845	January 7, 2018

Multi-language Update for Common Voice

Related topics