Training Deepspeech throws missing characters

EsakkiSundar_Varatharajan · November 20, 2020, 8:13am

I followed the steps given in Training 0.9.1

While running the bin/import_cv2.py I did not specify the optional paramter--filter_alphabet path/to/some/alphabet.txt. I got the .wav files for the .mp3 files which I downloaded from Common Voice.

I am getting the below error when I run the command python3 DeepSpeech.py --train_files ../data/CV/en/clips/train.csv --dev_files ../data/CV/en/clips/dev.csv --test_files ../data/CV/en/clips/test.csv

ValueError: Alphabet cannot encode transcript “it’s true then” while processing sample “{path to wav folder}/common_voice_en_22311413.wav”, check that your alphabet contains all characters in the training corpus. Missing characters are: [’’’].

Please let me know how to resolve this issue.

lissyx · November 20, 2020, 8:36am

The error reported is not explicit enough ? You have characters in your transcript that are not in your alphabet file.

EsakkiSundar_Varatharajan · November 20, 2020, 12:09pm

@Lissyx, Thanks for your quick reply. Yes, error is explicit.

When I looked at the Deepspeech/data/alphabet.txt file, I could see (single quote ’ ) in the file. Please let me know which other alphabet.txt file need to be modified before running the script?

lissyx · November 20, 2020, 12:24pm

Please read other’s reports, there is already a ton of infos. What you see might not be what you have, utf8 data can be tricky, make sure you generate alphabet from your dataset

Topic		Replies	Views
Error while training alphabet, says it is missing characters DeepSpeech	19	3247	June 18, 2020
Alphabet cannot encode transcript DeepSpeech learning , issue	11	2156	June 1, 2021
Missing character DeepSpeech learning , issue , dataset	1	989	November 2, 2020
Chinese Evaluation Error. DeepSpeech issue	1	507	December 15, 2020
Error while training Common Voice Data DeepSpeech	6	663	November 14, 2019

Training Deepspeech throws missing characters

Related topics