Training Deepspeech throws missing characters

I followed the steps given in Training 0.9.1

While running the bin/ I did not specify the optional paramter--filter_alphabet path/to/some/alphabet.txt. I got the .wav files for the .mp3 files which I downloaded from Common Voice.

I am getting the below error when I run the command python3 --train_files ../data/CV/en/clips/train.csv --dev_files ../data/CV/en/clips/dev.csv --test_files ../data/CV/en/clips/test.csv

ValueError: Alphabet cannot encode transcript “it’s true then” while processing sample “{path to wav folder}/common_voice_en_22311413.wav”, check that your alphabet contains all characters in the training corpus. Missing characters are: [’’’].

Please let me know how to resolve this issue.

The error reported is not explicit enough ? You have characters in your transcript that are not in your alphabet file.

@Lissyx, Thanks for your quick reply. Yes, error is explicit.

When I looked at the Deepspeech/data/alphabet.txt file, I could see (single quote ’ ) in the file. Please let me know which other alphabet.txt file need to be modified before running the script?

Please read other’s reports, there is already a ton of infos. What you see might not be what you have, utf8 data can be tricky, make sure you generate alphabet from your dataset