I want to train a Mandarin model:
my steps are as follows:
I download the data set from voice 6.1(chinese simple) ,decompressing this, make a soft link to data/Chinese
I run: ../bin/import_cv2.py ./Chinese/clips/ to create the csv file
I run: python -m deepspeech_training.util.check_characters -csv dev.csv,train-all.csv,train.csv,test.csv,validated.csv,other.csv -unicode -alpha > alphabet.txt to crate a alphabet.txt
I run:python3 DeepSpeech.py --train_files ./data/Chinese/clips/train.csv --dev_files ./data/Chinese/clips/dev.csv --test_files ./data/Chinese/clips/test.csv -epochs 1 --use_allow_growth true --save_checkpoint_dir ./result --alphabet_config_path data/alphabet.txt to train this model, I have replaced the data/alphabet.txt file.
but I get a problem like this:
ValueError: Cannot feed value of shape (29,) for Tensor ‘layer_6/bias/Initializer/zeros:0’, which has shape '(4884,)'
I read this tutorial-how-i-trained-a-specific-french-model-to-control-my-robot and find maybe the parameter of --lm_binary_path I have not set, but I can’t find this parameter after I run: ./DeepSpeech.py --helpfull.
I know this is alphabte.txt’s error, but my alphabte.txt is like this:
…
丞
纱
去
热
屈
迄
挠
闵
菠
锹
眼
晨
肤
樽
杂
牟
消
…
Stored in UTF-8 encoding. Is this alphabet.txt wrong?
I don’t know how to solve this problem.
my deepspeech version is v0.9.3
could some one can help me solve this problem?
Thanks, but there is other error when I create the .scorer file,
the command I run is:lm/generate_scorer_package --alphabet alphabet.txt --lm lm.binary --vocab vocab-4883.txt --package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284
The error is as follow: 4882 unique words read from vocabulary file.
Looks like a character based (Bytes Are All You Need) model.
–force_bytes_output_mode was not specified, using value infered from vocabulary contents: true
Error: Can’t parse scorer file, invalid header. Try updating your scorer file.
Error loading language model file: Invalid magic in trie header.
I can’t install the build-in kenlm, many errors will report when I cmake it. so I download the latest version of kenlm from kenlm and install it. I use this version to create the lm.binary, the command is build_binary -T -s lm_filtered.arpa lm.binary
what should I do next? How can I solve this problem?
all steps are follow this guide External scorer scripts