Hi,
To start: thanks for the really great project !
Unfortunately the downloadable datasets don’t have an (visible) publication date.
At the moment I’m wondering how up to date the Dutch dataset is because the download page gives the following stats:
Size 382 MB
Validated Hr. Total 12
Overall Hr. Total 13
Number of Voices 373
However, when I download this dataset i only end up with 366MB.
Apart from that the graph on https://voice.mozilla.org/en shows:
Dutch
Hours Recorded 23h
Hours Validated 18h
So that is a significant difference (1/3 new validated speech !).
So I’m wondering:
- if the current link for downloadable dataset is actually correct (since there is a size difference 382mb versus 366mb)
- when the downloadable dataset will be updated (is there a regular interval ?
- if it would be wise to add a publication date to the dataset stats.
Regards,
Sander