Due to some impassioned and well informed ongoing community discussions and work with our internal experts, the process to begin including clips of up to 15 seconds (an increase of 50% from the previous clip length of 10 seconds) has begun for the Common Voice datasets.
We still have more work to complete to get these clips into future datasets, but you can find more information about our thinking and the process here.
Hey @jesslynnrose, except for the rule files I mentioned here, is it possible for us to post longer sentences (assuming we adapted the rule files)?
I’m in the process of bulk-posting 15+ word sentences from books in the public domain (where the author died more than 70 years ago), where I already have them added with shorter sentences.
Can you please give some feedback on the status (I’m not sure “dmitrij” is here)?
But you need to create a validation file for your language (or modify any existing to increase) number of allowed words and/or sentence length if an existing validation file is using it. Here is an example which sets max-words to 20 for this purpose:
To correctly set a value here, you need to know char-speed in your dataset, if it is already released, I can help with that. What is your language code?