Hi Common Voice team,
at the end of 2019 we started to collect upper sorbian (hsb) sentences just to get the level of 5000 to get started. We didn’t have any strategy at this time, but now we want to build a domain specific prototype of language recognition, so we need some well balanced content (limited to domain specific sentences and limit the recordings to those sentences). Since the common voice is a very comfortable platform, I have a few questions how we could use it in a better way.
- Is it possible to restart the hsb, just keeping those sentences, which are already recorded? So all sentences, which are not recorded yet, could be deleted (I have a backup of the sentences file). Even when there are less than 5000 left, please keep us going. We will fill it up again with more balanced content.
- Is there a general way to mark/tag sentences to be able to choose them with priority for recording and validation? We think of domain specific collections. Or is there a possibility to have subsets for hsb?
- How could we get the current recordings? Some of them are not validated yet, but we still would need it for some project tests.
Thanks a lot for any help!