Hi,
I am currently trying to create my own language model with Mozilla Deepspeech, but when I start the process, it rapidly ends displaying the following error:
KeyError: "ERROR: Your transcripts contain characters (e.g. 'c') which do not occur in data/alphabet.txt! Use util/check_characters.py to see what characters are in your [train,dev,test].csv transcripts, and then add all these to data/alphabet.txt."
First, I checked many times in my alphabet.txt which is located in an other path than the one displayed above. Both contain the ācā character, along with the other characters present in my dataset transcriptions.
And then, why is it searching in data/alphabet.txt even if I expressly tell him to get this file in another location?
Below is the content of my alphabet.txt file:
t
r
a
n
s
c
i
p
v
o
l
Ć
e
q
u
d
Ć©
f
m
x
j
'
h
g
ĆØ
y
b
Ć¹
Ƨ
ĆŖ
Ć“
z
Ć¢
Å
Ć®
k
Ć»
w
And here is the command I execute to start the training.
sudo python -u DeepSpeech.py --train_files /media/sf_\[VRT\]_Debian_STT_v1/Language\ Model/script_preparation_data/train/train.csv --dev_files /media/sf_\[VRT\]_Debian_STT_v1/Language\ Model/script_preparation_data/dev/dev.csv --test_files /media/sf_\[VRT\]_Debian_STT_v1/Language\ Model/script_preparation_data/test/test.csv --train_batch_size 80 --dev_batch_size 80 --test_batch_size 40 --n_hidden 375 --epochs 33 --early_stop True --es_steps 6 --es_mean_th 0.1 --es_std_th 0.1 --dropout_rate 0.22 --learning_rate 0.00095 --report_count 100 --export_dir /media/sf_\[VRT\]_Debian_STT_v1/Language\ Model/script_preparation_data/results/model_export/ --checkpoint_dir /media/sf_\[VRT\]_Debian_STT_v1/Language\ Model/script_preparation_data/results/checkout/ --alphabet_config_path /media/sf_\[VRT\]_Debian_STT_v1/Language\ Model/script_preparation_data/alphabet.txt --lm_binary_path /media/sf_\[VRT\]_Debian_STT_v1/Language\ Model/script_preparation_data/lm.binary --lm_trie_path /media/sf_\[VRT\]_Debian_STT_v1/Language\ Model/script_preparation_data/trie
Do not hesitate to ask me for other information, code piecesā¦
Thanks by advance for your help!