Hi, great project! Kudos to you all!
I am looking at the datasets, and for some languages, the amount of validated recordings is very low compared to the total number of recorded hours.
Is it possible somehow to only download validated recordings?
In fact, should non-validated recordings be included into the datasets in the first place? Downloading 30 GB of recording when one can (hopefully) use 7 GB with some confidence doesn’t seem very efficient or environment friendly.