Metadata File Only

Is it possible to have a file of only the metadata from the speakers included in the Common Voice data? I would like to use the distributions of age and gender in a project related to inclusion of older persons data in ML datasets.

Would you like aggregated metadata ? And which language(s) are you interested in ?

Here is one command you could use:

itzpapalotl:~/CV/cv-corpus-6.1-2020-12-11$ for i in *; do cat $i/train.tsv | grep -v '^client_id' | cut -f6-8 | sort -f | uniq -c | sort -gr | sed 's/^ *//g' | sed "s/^/$i\t/g" ; done  | grep -v '[       ][0-9][         ]'

The result would be something like this.

If you’d like more help in real time, feel free to join us on Matrix.