Increase the number of words limit for Odia

Odia, being an Indic and one of the six cultural language has many single/double letter words. This 14 words limitation is making the sentences handicapped without a meaning.

Even if I break sentences under 14 words, it’s pretty hard to get a meaningful sentence out of those words. I request to please increase the number of words limits for Odia.

Word limitations can be configured per language. You can find more information here: https://github.com/Common-Voice/sentence-collector/blob/master/shared/validation/VALIDATION.md

What do you think how long a sentence should be for Odia so it’s still in the range of acceptable clip length?

@nukeador I forgot, how long do we want the clips to be on average?

@mkohler, Thanks for the guidance. It would be enough to increase the words from 14 to 30. I have submitted a PR for that as per the instructions.

@soumendra thanks for your feedback. Before moving into any direction, it would be good to understand how long would it take to read a sentence today and also how long (on average) would it take to read a 30 characters sentence.

I don’t know if you are, but If you can get a Odia linguistic expert, it would be useful for defining the exact number we would need. We want sentences that require less than 8 seconds to read (for user experience).

Thanks.

@nukeador responding to the discussions here in this thread, here and here: I just looked up for the United Nations Universal Declarations of Human Rights, a public domain document and has sentences that are quite common in terms of length. I spoke in a moderate speed, and used a stop watch to check how long it is taking for each sentence.

  • 22 words, 89 characters: 11:60 secs
  • 19 words, 81 characters: 09:91 secs
  • 15 words, 69 characters: 08:41 secs
  • 24 words, 93 characters: 13:41 secs

It’s also important to note that it was not always the number of words but the complexity of the word that defines how long it will take someone to record. Odia as a language has influences from multiple languages and many words, which might have less number of characters, might take longer to pronounce. I will leave it to you to increase the limit based on the context above. (There is not much extensive research on such specific issues but I can reach out to a linguist and check. Do you need a recording or an email or some such documentation if I manage to do that?)

@mkohler is this something we can change to 20 words? It seems that’s more less 10s.

Yep, this is done now.

2 Likes