V6.1 is masquerading as a tar file when it is actually a tar.gzip file

makoto_wada_jp · June 21, 2022, 11:01am

@bozden Thanks for the comment. I have not checked old German databases but I take your word for it. I understand and sympathise with your problem. Consistency is very important for any database … that and good intuitive documentation, so it is really unfortunate. Maybe that is why I do not see a lot of research paper that uses CommonVoice. However, I guess beggers (those who desperately need data) can not be chosers.

Lastly, I apologies for my original comment above which had an error. I can not edit it now (perhaps because there is your reply, which I am grateful for) but here is what I meant to say:

[Error] it really isn’t a tar file but a gzip file (as ht file extention suggest)
[Fix] it really isn’t a tar file (as the file extention suggest) but a gzip file
[Error] I have not checked all languages and versions permutations
[Fix] I have not checked all permutations (all language and version patterns)

Topic		Replies	Views
Common Voice by Mozilla Common Voice issue	3	873	June 19, 2021
Common Voice Dataset format Common Voice	3	442	July 1, 2021
Versioning the datasets Common Voice	5	496	March 31, 2020
Looking for Common Voice Corpus English before 2019-02-25 (v1) release Common Voice	6	857	June 21, 2021
Common Voice mid-year release - more data, more languages! Common Voice announcements , dataset	20	2511	August 12, 2019

V6.1 is masquerading as a tar file when it is actually a tar.gzip file

Related topics