Audio Data Issue: Empty Audio File in Common Voice Dataset

bozden · November 3, 2024, 10:35pm

Hey @kathyreid, @jesslynnrose, I released the statistics and “bad-audio” (subjective) listings. You can read more on it here.

@kathyreid, I can send you detailed records if you PM me on your dataset (language/version) to compare with your results. I need to extract them, the .tsv file is 4 GB in size for all languages.

Topic		Replies	Views
Bug: empty/silent clips during validation O’zbek (uz)	1	578	December 3, 2021
Missing data info in common-voice german dataset ver de_538h_2019-12-10 Common Voice dataset	3	607	January 24, 2020
Common Voice Toolbox: Updated with CV v22.0 data Common Voice feedback , tooling	19	3136	July 28, 2025
600 hours of audio is missing (?) in Bengali Common Voice	4	627	July 22, 2023
Older English dataset question Common Voice dataset	6	1469	June 15, 2021

Audio Data Issue: Empty Audio File in Common Voice Dataset

Related topics