Multi-Language-Dataset (Beta) is gone

Hi there,

currently I’m working on a paper at university trying to assess the quality and results with the common voice german dataset (clips.tsv.zip, de.zip), but since yesterday these files are not on your S3 Storage anymore.

Instead there are now other files listed like “de.tar.gz”, which are really small in size…

Any chance, the old files may come back soon or another complete, updated dataset is coming up?

Hey,

oh dang, thanks for the notice. I’ll get the files back up today.

Best,
Gregor

1 Like

Hi @gregor ,

I received a mail from Lindsay containing links to the new speech-dataset (release date: 2019-02-13). However it only contained the newly released audio data. Is there a way to also get the newly updated clips.tsv.zip?

Greetings,
Simon

Hey, we’re not gonna release an updated clips.tsv for now as we only have resources to scrub the existing one. Sorry!

Hmm… any chance I can get the old clips.tsv.zip file again?
It still seems to be unavailable at the moment :frowning:
I want to continue my work on CorporaCreator, but without any data, I cannot do so…

Oh, yes we fixed the link in the original post. Here it is: https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-2019-02-13/clips.tsv.tar.gz

1 Like