I can not find any documentation regarding Mozilla Common Voice except for: Common Voice > How does it work?.
So here are the things I would like to know about the database.
- Where can I find a documentation and revision history of the database (v1, v2, … v5.1, v6.1) ?
- What does each *.tsv file represent especially the following:
- invalidated.tsv
- other.tsv
- reported.tsv
- validated.tsv
and which of the tsv file represent Clip Graveyard mentioned in Common Voice > How does it work??
- What does each header represent in the *.tsv file, especially “segment” & “reason”.
I presume, “up_votes” and “down_votes” are related to “>= 2 Yes votes” and “>= No votes” mentioned in Common Voice > How does it work?. - The header for reported.tsv is different from the rest, but I guess I will know why if what each file represents.
HEADER (Y = Exist, N = Does Not Exist)
file | client_id | path | sentence | sentence_id | up_votes | down_votes | age | gender | accent | locale | segment | reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|
reported.tsv | Y | Y | Y | N | Y | Y | Y | Y | Y | Y | Y | N |
the rest of *.tsv | N | N | Y | Y | N | N | N | N | N | Y | N | Y |
- Is the database download incremental or full version?
In other words,- a) Can I just download v6.1 and it will contain everything from v1 … v5.1?
or - b) Do I need to download v1, v2, … v5.1, v6.1 to construct the full database
- a) Can I just download v6.1 and it will contain everything from v1 … v5.1?