Load validated split Hugging Face data?

Rik_Raes · May 30, 2023, 9:36am

When loading the data from Hugging Face, it does not seem possible to load the validated split for the Dutch language, as provided in the image below. I use the following lines of code to load the data.

from datasets import load_dataset
load_dataset("mozilla-foundation/common_voice_13_0", "nl", streaming=False)

I would like to load all 86798 instances which can be downloaded from the common voice project itself, using the load_dataset(), but this does not seem possible. Furthermore, Hugging Face provides that the ‘nl’ data set should have this number of instances in the validated split, but I cannot seem to load it? When attempting this for other languages, it does also not provide the option for a validated split.

Topic		Replies	Views
Discrepancy in Hours Between Common Voice Datasets Page and Hugging Face Download Common Voice dataset	3	593	August 12, 2024
Differences between data from Huggingface dataset and download dataset? Common Voice	4	764	May 31, 2023
Is is possible to download only validated recordings? Common Voice	9	611	October 25, 2023
Common Voice datasets (Mandarin zh-tw) Common Voice dataset	2	927	May 23, 2019
Dataset split best practices? Common Voice feedback , dataset	23	4831	December 23, 2019

Load validated split Hugging Face data?

Related topics