Weekly Update: Tips for Sentence Collection

Hey everyone !

This week’s update features; tips for sentence collection by Jonathon (Common Voice Luganda Rep), Grants and Opportunities.


Language Rep Sessions and Sentence Collection Tips from Jonathon

Firstly this week language reps met to discuss; Model and Methods Competitions, Sentence collection Strategies, engaging Older age groups and people with limited digital access, as well as tips for sentence collection.

In one of our sessions Jonathon provided some advice on how the Luganda Community was able to grow their corpus., check out the graphic to find out how.

Alt text is at the bottom of the document

Grants and opportunities

  1. Common Voice Model and Methods Competition.

Have you got an idea that can support your language’s diversity on the platform or proof of concept idea using a small Common Voice data corpus?

Well Sign Up to take part and win then chance of getting $2,000.

  1. Have you got a Feminist Tech Idea and are based in the Global South?

Check out the Numun Fund


Alt Text: Stages: Pre-checking, verification, validation, submission

Pre-checking

  • Check that your source of the text is under creative commons zero (CCO) or connect with copyright owners to dedicate the text to CC0

Verification

  • Export the sentences into CSV format
  • Filter out sentences that are over 14 words and keep them aside
  • Make sure no other language is within the corpus

Validation

  • Gather 2-3 people (ideal linguists) who have an understanding of the language
  • Review a sample of the sentences based on this formula
  • Is the sentence easy to understand, engaging, and includes grammatical gender (if relevant)?
  • Are the Spelling and grammar good?

Submission

  • Report the error rate for the sample
  • Convert CSV to TXT file
  • Make a pull request including relevant information
3 Likes