Problem with preprocess common voice dataset

plusout · April 26, 2020, 1:55pm

I installed deepspeech 7 release and try test recognition. All worked ok. After this i try ro prepare dataset according to instruction https://deepspeech.readthedocs.io/en/v0.7.0/TRAINING.html
.
I try
bin/import_cv2.py --filter_alphabet path/to/some/alphabet.txt /path/to/extracted/language/archive

but have an error

(deepspeech-train-venv) (base) v@gpu:~/ASR/DeepSpeech$ bin/import_cv2.py --filter_alphabet data-cv/extracted/ru/alphabet.txt /data-cv/extracted/ru/archive

Traceback (most recent call last):
File “bin/import_cv2.py”, line 18, in
from deepspeech_training.util.downloader import SIMPLE_BAR
ModuleNotFoundError: No module named ‘deepspeech_training’

ALL directorys from distibution are present.
(deepspeech-train-venv) (base) v@gpu:~/ASR/DeepSpeech$ dir
bazel.patch
BIBLIOGRAPHY.md
bin
build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR
CODE_OF_CONDUCT.md
CONTRIBUTING.rst
data
DeepSpeech.py
deepspeech_training
doc
Dockerfile
evaluate.py
evaluate_tflite.py
examples
GRAPH_VERSION
images
ISSUE_TEMPLATE.md
LICENSE
lm_optimizer.py
native_client
README.rst
RELEASE.rst
requirements_eval_tflite.txt
requirements_tests.txt
requirements_transcribe.txt
setup.py
stats.py
SUPPORT.rst
taskcluster
tests
training
transcribe.py
util
VERSION

Please help me how to solve this problem.

lissyx · April 26, 2020, 1:58pm

What part of the doc did you missed reading ? https://deepspeech.readthedocs.io/en/v0.7.0/TRAINING.html#installing-deepspeech-training-code-and-its-dependencies
It’s explicitely documented here, and you linked that very same doc.

plusout · April 26, 2020, 2:40pm

I did this steps
git clone https://github.com/mozilla/DeepSpeech

python3 -m venv $HOME/ASR/deepspeech-train-venv/

source $HOME/ASR/deepspeech-train-venv/bin/activate

cd DeepSpeech
pip3 install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3
pip3 install --upgrade --force-reinstall -e .

sudo apt-get install python3-dev

bin/import_cv2.py --filter_alphabet path/to/some/alphabet.txt /path/to/extracted/language/archive

I missed part recommendations ( i want to try without GPU for begin).

lissyx · April 26, 2020, 3:07pm

I’m wondering if you are running importer with the proper Python binary … Could you please verify / share logs to ensure your install was successfull?

This explicitely states you have not properly installed.

lissyx · April 26, 2020, 3:11pm

Please pip list and pip3 list before running bin/import_cv2.py

plusout · April 26, 2020, 3:32pm

Thanks for help. I repeat installation and all work. It convert dataset.

lissyx · April 26, 2020, 11:12pm

Thanks, if you can identify what was unclear / error prone in the docs and that you can send a PR to improve, this is welcome.

plusout · April 28, 2020, 11:16am

There are some unclear moments in documentation.
1.
https://deepspeech.readthedocs.io/en/v0.7.0/TRAINING.html
installing for training
and
https://deepspeech.readthedocs.io/en/v0.7.0/USING.html
installing for using
What is the difference ?
When i installed for using i can recognize. And deepspeech command work.
When i installed all for training dataset preparation work good , but

./DeepSpeech.py --train_files ~/ASR/data-cv/clips/train.csv --dev_files ~/ASR/data-cv/clips/dev.csv --test_files ~/ASR/data-cv/clips/test.csv
bash: ./DeepSpeech.py: Отказано в доступе
doesn’t work.

Why should i install deepseepch for using with pip but install for training with git clone ( not with pip) ?
Can i use first enviroments ( for recognition ) for training ?

reuben · April 28, 2020, 11:26am

Training and inference are completely separate tasks. If you just want to do inference, you don’t have to install the training package. If you just want to do training, you don’t have to install the inference package. This is why the training package is called deepspeech_training, to make it extra clear what it’s for.

You should ideally use separate virtual environments for training and inference.

The error you got is because DeepSpeech.py is not executable. Use python DeepSpeech.py as we show in the docs. You should familiarize yourself with the Linux command-line, it’ll help you immensely with training tasks (and in general).

plusout · April 28, 2020, 11:45am

Thanks for explanations. I expected this difference. Python DeepSpeech works ( but give some errors during execution).
I think it wold be better to give exact commands in documentation with python command.
bin/import_cv2.py …
works without python bin/import_cv2.py

but DeepSpeech.py not.

reuben · April 29, 2020, 1:55pm

Ugh, I just realized there were some commands in docs using ./ instead of calling Python directly. I’ve made a PR fixing it. Thanks for the feedback.

Topic		Replies	Views
ModuleNotFoundError: No module named 'deepspeech_training' DeepSpeech	5	1410	January 21, 2021
Problem with deepspeech training (latest version), not recognised as a module when making csv/wav files with bin/import DeepSpeech learning , issue , dataset	19	3509	August 31, 2020
Need help installing DeepSpeech for training/transfer learning in Google Cloud VM DeepSpeech issue	3	846	October 30, 2020
No module named 'ds_ctcdecoder' DeepSpeech	14	1440	May 13, 2020
No module named 'deepspeech_training' DeepSpeech	20	9464	September 6, 2021

Problem with preprocess common voice dataset

Related topics