Community Strategy: What does Community Health mean to you?

IMHO the second important aspect of Common Voice is recording dying languages. There are about 7000 languages and about 2500 are in danger. And with the digital era, only %5 is presented on the WWW.

Most of these (very local, some tribal) languages are only spoken only by a few elderly people and they themselves are nearing their EOL.

I’d say -like we are doing in Oral History interviews for museums and language research-, let’s forget about diversity and ML in these cases and record them…



Thank you for reminding me…

I do not think that this is the main goal and thinking in such a way is very reductionist and problematic. We can think of the main goal (imo) as getting enough and appropriate data for a robust speech recognition system. This cannot be defined only in number of hours or number of clips.


Please keep in mind that some of the proposals here in this thread could lead to frustation. And frustated contributers make frustated things.

Hey everyone,

Thank you so much for your feedback on this discussion, it’s really helped me in understanding your needs. There have been various points of view and I have tried to summarise this with the following bullet points:

Community health looks like…

  • Contributors understand how their data contributes to creating Speech-to-text Models and the social-technical implications e.g dataset bias
  • Contributors can connect and organise with core contributors within their languages e.g localisation, networking, visibility of language community spaces
  • Community mobilisers can measure their impact on the platform through metrics
  • Community mobilisers can mitigate against project friction for example copyright issues in sentence collection

This isn’t an exhaustive list - If you have any further comments, please feel free to share. This ongoing conversation and point of reflection. (Also if you want to feedback via DM as well feel free too).

1 Like

Also found this one (by accident):

Gerhard Jäger cc4.0