Hi. here im goin to create a scorer file but it failed with message:
9 unique words read from vocabulary file.
Doesn’t look like a character based model.
Error: Can’t parse scorer file, invalid header. Try updating your scorer file.
Цитата
I cound find any related links to get help. Please help me. Thank you
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 27, 2020, 12:43pm
2
I could not find any steps to repro in your message. Hard to check what you did.
This is my first step
%cd /content/kenlm/build/bin
!./lmplz --order 5 --memory 50% --temp_prefix 15 --text /content/deepspeech/uzbek/dictionary.txt --arpa /content/deepspeech/uzbek/dictionary.arpa --discount_fallback --prune 0 0 1
!./build_binary -a 255 -s -q 8 trie /content/deepspeech/uzbek/dictionary.arpa /content/deepspeech/uzbek/lm.binary
This is a second one. below error appears after that
%cd /content/deepspeech/
!python ./data/lm/generate_package.py --alphabet /content/deepspeech/uzbek/alphabet.txt --lm /content/deepspeech/uzbek/lm.binary --vocab /content/deepspeech/uzbek/dictionary.txt --default_alpha 0.75 --default_beta 1.85 --package /content/deepspeech/uzbek/uzbek.scorer
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 27, 2020, 12:56pm
4
What’s the content of that file ?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 27, 2020, 12:58pm
5
@Akmal_Nodirov Also, what exact commit are you on ? What’s pip list | grep ds_ctcdecoder
?
the content is inside the file:
asslomu aleykum do’stim bu men ismim Akmal Ozodbek Shahzod
what do you mean by exact commit ? ihave cloned last version of deepspeech. its 0.7 or upper than that. the last one
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 27, 2020, 1:05pm
9
I don’t see the -v
we document in data/lm/generate_lm.py
for that call and that generate_scorer.py --help
advises you to ensure.
i think its this. Because i have cloned it approximately 9 hours ago
I have added -v:
!./build_binary -a 255 -s -q 8 -v trie /content/deepspeech/uzbek/dictionary.arpa /content/deepspeech/uzbek/lm.binary
but appears this error:
./build_binary: invalid option – ‘v’
Usage: ./build_binary [-u log10_unknown_probability] [-s] [-i] [-w mmap|after] [-p probing_multiplier]
or could you provide me a correct format of generating file, if you could all steps please. thank you
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 27, 2020, 2:41pm
14
Looks like you are not using proper version
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 27, 2020, 2:42pm
15
@Akmal_Nodirov Try and rebuild build_binary
and others from KenLM master?
Jendker
February 27, 2020, 3:30pm
16
Or maybe try with released v0.6.1 (if it is enough) and build lm.binary and trie files. Maybe this workflow will work smoother for you.
Version of deepspech ? or some other thing ? . Is it possible to add dictionary to a newer versions of deepspeech ? this is my version : 0.7.0-alpha.2
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
February 28, 2020, 7:59am
18
Version of KenLM. It seems we need polishing on this part of the project
Просто удаляешь kenlm и скачиваешь kenlm из гитхаб https://github.com/kpu/kenlm
сработает.
Just delete kenlm and download kenlm from githab https://github.com/kpu/kenlm .
it’ll work.
Update of kenlm does not help. I had the same issue and apparently what was changed with respect to v0.6.1 is that you need to provide -v
argument to build_binary
. The error Error: Can’t parse scorer file, invalid header. Try updating your scorer file.
is not quite helpful here.
Now I get only Doesn't look like a character based model
, but the package creation succeeds