Discrepancy in Hours Between Common Voice Datasets Page and Hugging Face Download

I’m not sure if I could understand the question correctly, but here you go: In general ML terminology, “dev split” and “validation split” are used interchangeably. E.g. Common Voice uses the data in dev.tsv to validate the knowledge the NN gets from train split during the forward pass. Other libraries might name it “validation.tsv” serving the same purpose.

Do not mix it with validated.tsv, it is all validated recordings, from which other splits come out (i.e. train, dev, test). CV has validated.tsv because it also releases invalidated and not-yet-validated ones (invalidated.tsv and other.tsv).