Mass import sentences into Sentence Collector


(Michal Stanke) #1


For @MekliCZ has been running a custom sentence collector tool on to gather some data in Czech. Would it be possible to mass import these sentences into the new collector tool, so people can start reviewing them? What format would be suitable the best? Or should we just dump them and upload via the form?


(Rubén Martín) #2

I know people have been able to c&p a few thousand of sentences, just note that it takes a few to upload all of them.

How many do you have? Pasting blocks of 2k should be safe.

(Michal Vašíček) #3

Hey, is it possible to send you sentences we already approved (so we can bypass approval in this tool)? We have ~2.5k approved sentences and ~4k to be triaged.

(Rubén Martín) #4


We changed the process to ensure quality back in December, we want all sentences to go through the tool. I hope you understand.


(Michal Stanke) #6

Hi @nukeador.

@MekliCZ has uploaded all sentences from we collected and reviewed in the past using our own tool. What is the exact flow for a sentence to get ready for Common Voice? I have noticed two states mentioned in the collector UI - reviewed and validated, what is the difference between them?

(Rubén Martín) #7

Sentences added to the tool need to be reviewed by other users, once they got enough positive votes they are validated.

We export all new validated sentences to the main site each few days.