Bin/import_cv2.py seems broken

the import paths are invalid
from deepspeech_training.util.importers import

and even if I go and fix them the script only writes .csv files that only contain the header line

If you don’t get a response here on Discourse I recommend opening an issue on Github: https://github.com/mozilla/DeepSpeech/issues?q=is%3Aissue+is%3Aopen+

No, @54696d21 has already been spamming several repos with the same question, not even documenting what this person is doing. Chances are, reading the documentation is what is missing. The code works well, that’s sure.

What repos do you accuse me of spamming? How do I fix the empty CSV files with the version in the Deepspeech repo?

I read all the relevant docs in readthedocs and searched for the issue extensively. I fixed the bugs I had by changing code in the mentioned file. I assume other people might have the same problem when they follow the docs like I had so I think it’s worth mentioning because the code in the repo doesn’t work for me.

What repos are you accusing me of spamming?

And yet you said nothing, no STR, no context: we can’t help you.

You asked the exact same question, same wording here and on other repositories.

my problem is calling the script this way lets the script terminate successfully (no stacktrace) but in the process only writing CSV files that only contain the header line. (Not containing the content script is supposed to produce from the TSV files from the Common Voice dataset)

bin/import_cv2.py --filter_alphabet $HOME/code/deepspeech-esperanto/data/alphabet-eo.txt $HOME/code/deepspeech-esperanto/esperanto-speech-corpus/mozilla/eo/clips/

You asked the exact same question, same wording here and on other repositories.

I didn’t do this except for one here somewhat similar question: https://github.com/coqui-ai/STT/discussions/1820
You seem to wrongfully accuse me.

And yet you said nothing, no STR, no context: we can’t help you.

To be more clear: I have a workaround but I assume others will have this problem too as the documentation led me down this path. I’m open to improving the documentation. Not sure what do you mean by STR? stacktrace seems hardly applicable here.

Did I ? You filed the exact same question without any detail. This is pure noise to everybody.

You have a “workaround” for a problem you have not described anything about.

Steps to reproduce.

My problem is that you call for a bug but:

  • you dont describe what you did (I still don’t understand what you are doing)
  • you dont explain where there is a bug
  • the only actionable information is that you mention “wrong path”, which 99.99% links to you trying to run the script without reading the doc, without performing pip install -e . etc.

So, once again, please explain what you are doing, completely.

TLDR: import_cv2.py writes “empty” (only the header line) CSV files. These shouldn’t be empty.


putting this aside (which the relevant parts of the documentation probably should mention but seem to fail to do so)

pip install -e .

the bug is that I call the the script like this (I can assure you the paths are right)

you dont describe what you did (I still don’t understand what you are doing)

I did call import_cv2.py like this (while the paths it is called with are correct)

bin/import_cv2.py --filter_alphabet $HOME/code/deepspeech-esperanto/data/alphabet-eo.txt $HOME/code/deepspeech-esperanto/esperanto-speech-corpus/mozilla/eo/clips/

I tried this both of course (the second version is the version the docs mention)

bin/import_cv2.py --filter_alphabet $HOME/code/deepspeech-esperanto/data/alphabet-eo.txt $HOME/code/deepspeech-esperanto/esperanto-speech-corpus/mozilla/eo/

Did I?

ok moving the goal post… you said I was spamming the exact same text in multiple repos while the truth is that I didn’t post this question anywhere else

I found that the documentation is missing critical steps and import_cv2.py fails to do it’s job when used according to the documentation. (here: https://mozilla.github.io/deepspeech-playbook/DATA_FORMATTING.html)

My problem is that you call for a bug but you dont describe what you did

this is the problem description, which part do I need to rephrase?
calling the script lets the script terminate successfully (no stacktrace) but in the process only writing CSV files that only contain the header line. (Not containing the content script is supposed to produce from the TSV files from the Common Voice dataset)

What do you mean? I don’t get your point here.

once again: have you followed the docs and properly setup the python virtualenv?

This is the first time you explain that clearly.

This documentation is maintained by @kathyreid and this specific part is only about the way to format.

Have you followed the docs? https://deepspeech.readthedocs.io/en/v0.9.3/search.html?q=import_cv2&check_keywords=yes&area=default

We still lack useful informations:

  • deepspeech tree version
  • how did you performed install and setup
  • what release of common voice data do you use

This is our tenth exchance, I shared you the guidelines earlier: my time is limited, and having to ask again and again the same thing is not making this really efficient.

Everything is there: https://deepspeech.readthedocs.io/en/v0.9.3/TRAINING.html?highlight=import_cv2

I don’t get yoyr point here.

You’re being very unprofessional.

You’re saying that asking for clarification to ensure I understand your problem is unprofessional ?

@54696d21 If you are not willing to share actionable items, proper steps to reproduce, how can you expect I can investigate and help you ?

to give you the benefit of the doubt here: what do you think what yoyr means
(this seems a common interpretation: https://www.urbandictionary.com/define.php?term=YOYR)
Which I understand as a rude sexist slur (hence unprofessional)

What.

On my keyboard, Y is next to U: that’s called a typo.

Skipped 53 samples that were longer than 10 seconds.
Final amount of imported audio: 4:48:24 from 4:57:27.
Saving new DeepSpeech-formatted CSV file to:  /DeepSpeech/cv-corpus-6.1-2020-12-11/eo/clips/other.csv
Writing CSV file for DeepSpeech.py as:  /DeepSpeech/cv-corpus-6.1-2020-12-11/eo/clips/other.csv
$ docker run -it mozilla/deepspeech:v0.9.3
# apt update
# apt install sox libsox-fmt-mp3
# cd /DeepSpeech
# wget esperanto-release-URL
# tar xf eo.tar.gz
# python bin/import_cv2.py cv-corpus-6.1-2020-12-11/eo/
[...]
# # ls -hal cv-corpus-6.1-2020-12-11/eo/clips/*.csv
-rw-r--r-- 1 root root 726K Mar 30 08:22 cv-corpus-6.1-2020-12-11/eo/clips/dev.csv
-rw-r--r-- 1 root root 253K Mar 30 08:24 cv-corpus-6.1-2020-12-11/eo/clips/other.csv
-rw-r--r-- 1 root root 727K Mar 30 08:22 cv-corpus-6.1-2020-12-11/eo/clips/test.csv
-rw-r--r-- 1 root root 1.9M Mar 30 08:24 cv-corpus-6.1-2020-12-11/eo/clips/train-all.csv
-rw-r--r-- 1 root root 1.6M Mar 30 08:23 cv-corpus-6.1-2020-12-11/eo/clips/train.csv
-rw-r--r-- 1 root root 4.5M Mar 30 08:24 cv-corpus-6.1-2020-12-11/eo/clips/validated.csv

You made me discover that. This was just a typo. And to be honest, my sentence with this yoyr would make no sense at all.

Hi Tim,

Are you able to clarify some information for me? I’m a colleague of @lissy’s and I wrote the recent PlayBook - if there’s an error in there I’d like to get to the bottom of it so that other DeepSpeech developers don’t experience the same frustration.

  • DeepSpeech version - I will assume 0.9.3
  • Setup - if you are using the PlayBook, I will assume Ubuntu Linux under Docker, but could you please confirm?
  • And are you using the import instructions in the data section of the PlayBook?
  • And if you use these instructions, the resulting csv file is headers only, no data? Are you able to provide terminal output or the error message that occurs? This will help us to resolve the error.