Question: All datasets without recordings (i.e. clips.tsv)

bozden · August 22, 2022, 7:09pm

I have several ideas about cross-language and/or time-axis (version) analysis of CV datasets. But as the datasets include the recordings they are huge if you download many languages and many versions, both for bandwidth and for disk space. If you do not do the actual trainings, you do not need the recordings…

If we could have “clips.tsv” file (DB dump before Corpora Creator) - or packages containing only .tsv files downloadable, that would be extremely helpful for such analysis.

Is this available/possible?

Thanks…

Topic		Replies	Views
Multi-Language-Dataset (Beta) is gone Common Voice issue , dataset	5	674	February 20, 2019
Is is possible to download only validated recordings? Common Voice	9	665	October 25, 2023
Common Voice Toolbox: Updated with CV v22.0 data Common Voice feedback , tooling	20	3475	November 19, 2025
Older English dataset question Common Voice dataset	6	1528	June 15, 2021
2566 sound clips without data in english dataset Common Voice dataset	1	600	April 24, 2020

Question: All datasets without recordings (i.e. clips.tsv)

Related topics