When I download the luganda dataset, I get an unknown file format from the zipped .tar file
According to the documentation on GitHub - common-voice/cv-dataset: Metadata and versioning details for the Common Voice dataset, I should get a .tar.gz file format.
Please help, am trying to get the text corpus for the luganda language from the dataset??
Dear @Abraham_Kakooza. The file is a tar.gz file, you can extract it using tar -xzf
. If this doesn’t work, perhaps it has already been extracted, use file
to find out what filetype it shows. I was able to extract the Luganda data no problem. Feel free to join us on the Common Voice Matrix channel for real time question and answer.
Thanks alot @ftyers for this, am going to try it out and let you know if am successful and glad to meet you.
1 Like
Thanks @ftyers, it has worked. This was quite helpfull
1 Like