ValueError

thanks for your reply and i will try again later!

zhangpeng_K hello, I am still dealing with alphabet.txt issue and import_CV2.PY, As soon I overcome I will tell if I manage to overcome this issue

firstly , as @lissyx said, you must ensure your ds version and and ds_ctcdecoder are same, and then ,if you add some new alphbet into alphabet.txt , you must have to rebuild wors.arpa 、lm.binary with kenlm tool ,and then use the generate_trie to generate trie binary , finally update the train shell (the path of the files you have updated). following is my environment:
(deepspeech-venv) (base) zhangp@zhangp:~/tmp/deepspeech-venv$ pip list
Package Version


absl-py 0.9.0
astor 0.8.1
attrdict 2.0.1
audioread 2.1.8
beautifulsoup4 4.8.2
bs4 0.0.1
certifi 2019.11.28
cffi 1.14.0
chardet 3.0.4
decorator 4.4.2
deepspeech-gpu 0.6.1
ds-ctcdecoder 0.6.1
gast 0.2.2
google-pasta 0.1.8
grpcio 1.27.2
h5py 2.10.0
idna 2.9
joblib 0.14.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
librosa 0.7.2
llvmlite 0.31.0
Markdown 3.2.1
mock 4.0.2
numba 0.48.0
numpy 1.18.1
opt-einsum 3.1.0
pandas 1.0.1
pip 20.0.2
pkg-resources 0.0.0
progressbar2 3.47.0
protobuf 3.11.3
pycparser 2.19
python-dateutil 2.8.1
python-utils 2.3.0
pytz 2019.3
pyxdg 0.26
requests 2.23.0
resampy 0.2.2
scikit-learn 0.22.2
scipy 1.4.1
semver 2.9.1
setuptools 45.2.0
six 1.14.0
SoundFile 0.10.3.post1
soupsieve 2.0
sox 1.3.7
tensorboard 1.15.0
tensorflow-estimator 1.15.1
tensorflow-gpu 1.15.0
termcolor 1.1.0
urllib3 1.25.8
webrtcvad 2.0.10
Werkzeug 1.0.0
wheel 0.34.2
wrapt 1.12.0

hello @Stanislavs_Davidovics: i dont use import_CV2.py to resolve alphabet.txt, i just rely the tutorial , and zhen write my own code to generate alphabet.txt samed with the offical, now i can train my own model ,if you have some experiences ,we can communicate any time ,thanks!

Finally solved, by re building language model .arpa and .binary and trie with modified alphabet file. and then used newly built LM .binary and trie to train model.
It worked. Thanks.
*I have used DeepSpeech 0.6.1 and setup kenlm build using python.

Hi lissyx, I need more detail clarification about your above reply

  1. Can you point me to the document you mention above
  2. When you say “need to retrain from scratch” what is it that need to be retrained from scratch? And how to go about this.

Thanks

Hi Zhangpeng_K,

Can you explain the following:

  1. How to rebuild wors.arpa and lm.binary with kenlm tool?
  2. Why do you need to rebuild wors.arpa and im.binary?
  3. Why do you still need to generate trie binary when you have the kenlm.scorer package?

Thanks and appreciate your insights!

https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html

Same, this is all in TRAINING doc and data/lm

I went through DeepSpeech’s Training your own model and also data/lm’s readme files many times but it doesn’t give any explanation as to why you need to build .arpa and .binary. The DeepSpeech’s training section is also quiet about this part. How do you link those 2 pieces of guide together to provide a better understanding on what’s going on when you have new alphabet to add in?

Decoding requires knowing the alphabet. Your vocabulary gets translated into the lm.binary file. So changing the alphabet will mean your lm.binary is invalid.

Where exactly it is mentioned inhttps://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html , that you need create new words.arpa and lm.binary.

Can we avoid getting the topic in three different directions here ? Rebuilding the LM mentions generating the trie file.

PLEASE PLEASE PLEASE IF YOU FIND THE DOC UNACCURATE, FILE ISSUES AND EXPLAIN WHAT YOU DON’T UNDERSTAND / MISS.

WE CAN’T GET INTO YOUR HEAD.

Agree. I edited my answer.

When I am reading https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html I could not find reference to words.arpa and lm.binary, maybe this should be added, because I found about it on different web pages.

Well, it is obvious to us and to many other people who have not had any problem doing their training that you need to build your own language model. So please file issue on Github explaining exactly what you miss or how you would word it. Better even if you send a patch adding the missing documentation.

Let me repeat: we cannot get into your head. We are deep down into the project, some things that are so obvious to us we can’t even know why it’s complicated to others. This is no arrogance or pushback.

If people don’t tell us that they don’t understand the current doc or else, we can’t improve.

Hi lissyx,

Does lm.binary applies to any language?
Reason I’m asking is I’m working on a zh-HK and id language. How would running python generate_lm.py distinguish which language that I’m working on?

Thanks

Yes. Please look at the doc explaining acoutic / language model. You need LM to perform decoding, whatever the language.

That question makes no sense to me. This code does not care about your language, it will just build a file that is being used by the decoder to help decoding acoustic output.

Where is this document on acoustic/language model?

And thanks for the reply on

it’s everywhere … in the original paper etc.

@CheahHeng_Tan It seems you are searching for a lot of links / references, can you please explain what you are working on ?