I want to train a Mandarin model:
my steps are as follows:
-
I download the data set from voice 6.1(chinese simple) ,decompressing this, make a soft link to data/Chinese
-
I run:
../bin/import_cv2.py ./Chinese/clips/
to create the csv file -
I run:
python -m deepspeech_training.util.check_characters -csv dev.csv,train-all.csv,train.csv,test.csv,validated.csv,other.csv -unicode -alpha > alphabet.txt
to crate a alphabet.txt -
I run:
python3 DeepSpeech.py --train_files ./data/Chinese/clips/train.csv --dev_files ./data/Chinese/clips/dev.csv --test_files ./data/Chinese/clips/test.csv -epochs 1 --use_allow_growth true --save_checkpoint_dir ./result --alphabet_config_path data/alphabet.txt
to train this model, I have replaced the data/alphabet.txt file.
but I get a problem like this:
ValueError: Cannot feed value of shape (29,) for Tensor ‘layer_6/bias/Initializer/zeros:0’, which has shape '(4884,)'
I read this tutorial-how-i-trained-a-specific-french-model-to-control-my-robot and find maybe the parameter of --lm_binary_path I have not set, but I can’t find this parameter after I run: ./DeepSpeech.py --helpfull
.
I know this is alphabte.txt’s error, but my alphabte.txt is like this:
…
丞
纱
去
热
屈
迄
挠
闵
菠
锹
眼
晨
肤
樽
杂
牟
消
…
Stored in UTF-8 encoding. Is this alphabet.txt wrong?
I don’t know how to solve this problem.
my deepspeech version is v0.9.3
could some one can help me solve this problem?