Hello. I’m starting my journey with DeepSpeech.
Language: Polish
DeepSpeech: 9.3
System: ubuntu 20.04
Common Voice: pl_129h_2020-12-11
I managed to run the test training, then tried to train my model on Polish Common Voice.
python3 bin/import_cv2.py --validate_label_locale /home/validate_label_pl.py --filter_alphabet /home/alphabet.txt /home/utomek/Polskids/cv-corpus-6.1-2020-12-11/pl
this is command i used
and here is the output:
Loading TSV file: /home/utomek/Polskids/cv-corpus-6.1-2020-12-11/pl/test.tsv
Importing mp3 files…
ERROR: Inexistent --validate_label_locale specified. Please check.
Process ForkPoolWorker-1:
Traceback (most recent call last):
File “/usr/lib/python3.6/multiprocessing/process.py”, line 258, in _bootstrap
self.run()
File “/usr/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 103, in worker
initializer(*initargs)
File “bin/import_cv2.py”, line 54, in init_worker
alphabet = Alphabet(params.filter_alphabet) if params.filter_alphabet else None
File “/home/utomek/tmp/deepspeech-train-venv/lib/python3.6/site-packages/ds_ctcdecoder/init.py”, line 47, in init
raise ValueError(‘Alphabet initialization failed with error code 0x{:X}’.format(err))
ValueError: Alphabet initialization failed with error code 0x1
ERROR: Inexistent --validate_label_locale specified. Please check.
Process ForkPoolWorker-2:
Traceback (most recent call last):
File “/usr/lib/python3.6/multiprocessing/process.py”, line 258, in _bootstrap
self.run()
File “/usr/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 103, in worker
initializer(*initargs)
File “bin/import_cv2.py”, line 54, in init_worker
alphabet = Alphabet(params.filter_alphabet) if params.filter_alphabet else None
File “/home/utomek/tmp/deepspeech-train-venv/lib/python3.6/site-packages/ds_ctcdecoder/init.py”, line 47, in init
raise ValueError(‘Alphabet initialization failed with error code 0x{:X}’.format(err))
ValueError: Alphabet initialization failed with error code 0x1
ERROR: Inexistent --validate_label_locale specified. Please check.
Process ForkPoolWorker-3:
I have polish alphabet file filled with polish letters and this is my validate_label_pl.py
def validate_label(label):
if 'a' in label: # disallow labels with 'a'
return None
return label.lower() # lower case valid labels
Not sure why it says my file is “Inexistent”. The alphabet.txt, validate_label_pl.py and Common Voice files are located inside home directory. Tried my best to follow documentation and discourse like this discusion