So, now that I proved you the code is working, can we get actionable items and share the content of that file?
No, I haven’t used a docker enviroment.
@lissyx
what does a
head cv-corpus-6.1-2020-12-11/eo/clips/dev.csv
output?
for me it returns
wav_filename,wav_filesize,transcript
Thanks for clarifying. Are you on Windows, Linux etc? Python version?
These all have a bearing on the importer
file paths.
As you can see in the ls -hal
the files are hundreds of KB or even MB.
That is why since the beginning I’m insisting on your sharing your setup completely. We can’t help you if we don’t know what you do precisely.
Ran the import again with your alphabet, it’s still working:
# ls -hal cv-corpus-6.1-2020-12-11/eo/clips/*.csv
-rw-r--r-- 1 root root 719K Mar 30 08:37 cv-corpus-6.1-2020-12-11/eo/clips/dev.csv
-rw-r--r-- 1 root root 252K Mar 30 08:38 cv-corpus-6.1-2020-12-11/eo/clips/other.csv
-rw-r--r-- 1 root root 719K Mar 30 08:36 cv-corpus-6.1-2020-12-11/eo/clips/test.csv
-rw-r--r-- 1 root root 1.9M Mar 30 08:38 cv-corpus-6.1-2020-12-11/eo/clips/train-all.csv
-rw-r--r-- 1 root root 1.6M Mar 30 08:37 cv-corpus-6.1-2020-12-11/eo/clips/train.csv
-rw-r--r-- 1 root root 4.5M Mar 30 08:38 cv-corpus-6.1-2020-12-11/eo/clips/validated.csv
# head cv-corpus-6.1-2020-12-11/eo/clips/train.csv
wav_filename,wav_filesize,transcript
common_voice_eo_20690131.wav,205100,hiroŝimo estis la sepa urbo laŭ nombro da loĝantoj
common_voice_eo_20690133.wav,152876,ĝi estis iama regna burgo
common_voice_eo_20690129.wav,169004,kun la akvo ankaŭ venas la salo
common_voice_eo_20725920.wav,195884,temas pri malpliiĝanta birdospecio
common_voice_eo_20729065.wav,167468,la lasta speco estas propra al ĉinio
common_voice_eo_20690234.wav,221228,tio estas ankaŭ dank al aktiveco de entreprenistoj
common_voice_eo_20725924.wav,144428,stacioj aspektas relative simile
common_voice_eo_20711894.wav,207404,gravas ankaŭ la geometrio kaj konstruo de ĉirkaŭa medio
common_voice_eo_20690130.wav,318764,la unuiĝinta reĝlando estis la nura lando ankoraŭ milita kontraŭ francio dum alia jaro
ok, if it works for you it’s probably fine. For me it reproducible doesn’t generate file content (but I got a workaround so it’s fine). (on python3.7 in the virtual environments setup like described in readthedocs on a Arch Linux 64bit with a python3.7 from AUR)
Once again, we need the exact steps you followed.
No it’s not fine: this code works. You should not need a workaround.
Either there is a bug in our docs, or somewhere else.
Have you verified the checksum of the eo.tar.gz
file to ensure it was downloaded properly?
What is the exact release of common voice you are using?
Can you share complete output (stdout, stderr) when running the importer?
Can you share exact setup steps from the beginning (no “I did as the docs” please)?
Have you verified the checksum of the eo.tar.gz
file to ensure it was downloaded properly?
I downloaded it multiple times and curl didn’t report any issue while downloading. I didn’t verify because I couldn’t find checksums using site:commonvoice.mozilla.org checksum
or site:deepspeech.readthedocs.io checksum
in google
Can you share complete output (stdout, stderr) when running the importer?
yes ok later (evening)
Can you share exact setup steps from the beginning (no “I did as the docs” please)?
i’ll put my stuff in a git
Just because curl
does not complain does not mean anything.
There’s a checksum on the bottom of the download page, after you click on the button: sha256 checksum: c19900010aee0f9eb39416406598509b1cdba136a16318e746b1a64f97d7809c