Common Voice by Mozilla

When I download the luganda dataset, I get an unknown file format from the zipped .tar file
According to the documentation on https://github.com/common-voice/cv-dataset, I should get a .tar.gz file format.
Please help, am trying to get the text corpus for the luganda language from the dataset??

Dear @Abraham_Kakooza. The file is a tar.gz file, you can extract it using tar -xzf. If this doesn’t work, perhaps it has already been extracted, use file to find out what filetype it shows. I was able to extract the Luganda data no problem. Feel free to join us on the Common Voice Matrix channel for real time question and answer.

Thanks alot @ftyers for this, am going to try it out and let you know if am successful and glad to meet you.

1 Like

Thanks @ftyers, it has worked. This was quite helpfull

1 Like