ValueError

Stanislavs_Davidovics · March 13, 2020, 10:24pm

Hello,
Can some1 suggest solution for this. I am running on Ubuntu 18.04 command python3 DeepSpeech.py --train_files /home/stass/latvian1/clips/train.csv --test_files /home/stass/latvian1/clips/test.csv
and getting error
ValueError: Cannot feed value of shape (256,) for Tensor ‘layer_6/bias/Initializer/zeros:0’, which has shape ‘(29,)’

lissyx · March 13, 2020, 10:32pm

Mismatching alphabet. Ensure you pass --alphabet_config_path properly.

zhangpeng_K · March 16, 2020, 1:37am

hello , Stanislavs_Davidovics:
having you resolve this error? I face the same error, im trying to train korean model , and im sure --alphabet_config_path properly, now i dont know what to do next , do you have any idears?

pj123 · March 16, 2020, 7:11am

@Stanislavs_Davidovics Did you solve this issue, I am facing same issue. Please help here.

ValueError: Cannot feed value of shape (29,) for Tensor ‘layer_6/bias/Initializer/zeros:0’, which has shape ‘(48,)’

Actually i have updated my alphabet.txt and now it has 48 character, previously default english alphabet text file has 29 character

lissyx · March 16, 2020, 8:01am

So, does anyone cares about my answer ? You all have mismatching alphabets.

pj123 · March 16, 2020, 8:38am

Yes, I have passed alphabet config path properly.

lissyx · March 16, 2020, 8:41am

But you explicitely said you changed it. As documented, if you change the alphabet you either need to retrain from scratch or use transfer learning for dropping some layers.

zhangpeng_K · March 16, 2020, 9:03am

hello lissyx : im train my own korean deepspeech model, you means i have to use my own datasets and kenlm tool to generate to kenlm.scorer fisrt and then update the alphabet.txt ? do have any tutorial for train non-English model? if you i have some case ,i will understand how to train other language model more quickly , thank you !

lissyx · March 16, 2020, 9:04am

Yes.

We have extensive documentation, please read it and file issues if it’s not clear / covering enough.

zhangpeng_K · March 16, 2020, 9:45am

thanks for your reply and i will try again later!

Stanislavs_Davidovics · March 18, 2020, 5:40pm

zhangpeng_K hello, I am still dealing with alphabet.txt issue and import_CV2.PY, As soon I overcome I will tell if I manage to overcome this issue

zhangpeng_K · March 19, 2020, 3:55am

firstly , as @lissyx said, you must ensure your ds version and and ds_ctcdecoder are same, and then ,if you add some new alphbet into alphabet.txt , you must have to rebuild wors.arpa 、lm.binary with kenlm tool ,and then use the generate_trie to generate trie binary , finally update the train shell (the path of the files you have updated). following is my environment:
(deepspeech-venv) (base) zhangp@zhangp:~/tmp/deepspeech-venv$ pip list
Package Version

absl-py 0.9.0
astor 0.8.1
attrdict 2.0.1
audioread 2.1.8
beautifulsoup4 4.8.2
bs4 0.0.1
certifi 2019.11.28
cffi 1.14.0
chardet 3.0.4
decorator 4.4.2
deepspeech-gpu 0.6.1
ds-ctcdecoder 0.6.1
gast 0.2.2
google-pasta 0.1.8
grpcio 1.27.2
h5py 2.10.0
idna 2.9
joblib 0.14.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
librosa 0.7.2
llvmlite 0.31.0
Markdown 3.2.1
mock 4.0.2
numba 0.48.0
numpy 1.18.1
opt-einsum 3.1.0
pandas 1.0.1
pip 20.0.2
pkg-resources 0.0.0
progressbar2 3.47.0
protobuf 3.11.3
pycparser 2.19
python-dateutil 2.8.1
python-utils 2.3.0
pytz 2019.3
pyxdg 0.26
requests 2.23.0
resampy 0.2.2
scikit-learn 0.22.2
scipy 1.4.1
semver 2.9.1
setuptools 45.2.0
six 1.14.0
SoundFile 0.10.3.post1
soupsieve 2.0
sox 1.3.7
tensorboard 1.15.0
tensorflow-estimator 1.15.1
tensorflow-gpu 1.15.0
termcolor 1.1.0
urllib3 1.25.8
webrtcvad 2.0.10
Werkzeug 1.0.0
wheel 0.34.2
wrapt 1.12.0

zhangpeng_K · March 19, 2020, 4:00am

hello @Stanislavs_Davidovics: i dont use import_CV2.py to resolve alphabet.txt, i just rely the tutorial , and zhen write my own code to generate alphabet.txt samed with the offical, now i can train my own model ,if you have some experiences ,we can communicate any time ,thanks!

pj123 · March 19, 2020, 4:27am

Finally solved, by re building language model .arpa and .binary and trie with modified alphabet file. and then used newly built LM .binary and trie to train model.
It worked. Thanks.
*I have used DeepSpeech 0.6.1 and setup kenlm build using python.

CheahHeng_Tan · March 19, 2020, 8:39am

Hi lissyx, I need more detail clarification about your above reply

Can you point me to the document you mention above
When you say “need to retrain from scratch” what is it that need to be retrained from scratch? And how to go about this.

Thanks

CheahHeng_Tan · March 19, 2020, 8:45am

Hi Zhangpeng_K,

Can you explain the following:

How to rebuild wors.arpa and lm.binary with kenlm tool?
Why do you need to rebuild wors.arpa and im.binary?
Why do you still need to generate trie binary when you have the kenlm.scorer package?

Thanks and appreciate your insights!

lissyx · March 19, 2020, 9:03am

https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html

lissyx · March 19, 2020, 9:03am

Same, this is all in TRAINING doc and data/lm

CheahHeng_Tan · March 19, 2020, 9:08am

I went through DeepSpeech’s Training your own model and also data/lm’s readme files many times but it doesn’t give any explanation as to why you need to build .arpa and .binary. The DeepSpeech’s training section is also quiet about this part. How do you link those 2 pieces of guide together to provide a better understanding on what’s going on when you have new alphabet to add in?

lissyx · March 19, 2020, 9:09am

Decoding requires knowing the alphabet. Your vocabulary gets translated into the lm.binary file. So changing the alphabet will mean your lm.binary is invalid.