Hi everyone,
I was wondering if there are differences in the dataset splits (dev, train, test) between the Huggingface dataset version and the one you can download from https://commonvoice.mozilla.org/. I did some checks myself as I am using the Frisian subset for some experiments I am conducting for my Master thesis, but I am not sure if the results I get are correct because I am using data from both a downloaded version of the dataset and the Huggingface one.
Can someone else confirm if the datasets are the same between the 2 different sources? Thanks in advance.