Restart of HSB collection

jan.budar · June 18, 2020, 7:37pm

Hi Common Voice team,

at the end of 2019 we started to collect upper sorbian (hsb) sentences just to get the level of 5000 to get started. We didn’t have any strategy at this time, but now we want to build a domain specific prototype of language recognition, so we need some well balanced content (limited to domain specific sentences and limit the recordings to those sentences). Since the common voice is a very comfortable platform, I have a few questions how we could use it in a better way.

Is it possible to restart the hsb, just keeping those sentences, which are already recorded? So all sentences, which are not recorded yet, could be deleted (I have a backup of the sentences file). Even when there are less than 5000 left, please keep us going. We will fill it up again with more balanced content.
Is there a general way to mark/tag sentences to be able to choose them with priority for recording and validation? We think of domain specific collections. Or is there a possibility to have subsets for hsb?
How could we get the current recordings? Some of them are not validated yet, but we still would need it for some project tests.

Thanks a lot for any help!

Greetings to @sorbian-team @mkohler

Topic		Replies	Views
Sentences analysis on main languages - Action needed for the ones with deficit Common Voice sentence-collection	14	2008	August 6, 2019
Single Sentence Record Limit feature release Common Voice announcements	18	3169	June 13, 2022
We want your feedback: Improving the sentence collection Common Voice sentence-collection , feedback	34	8975	December 17, 2018
Time needed after adding sentences to the sentence collector to start using them? Common Voice sentence-collection	16	1571	December 18, 2020
Upper Sorbian dataset download Common Voice dataset	6	1035	July 1, 2020

Restart of HSB collection

Related topics