I am trying to get the mic_vad_streaming example running with my language model but it throws Error: Trie file version mismatch (4 instead of expected 3). Update your trie file.
followed by: Successfully loaded LM and TRIE
[The program does not stop and continues to the inference stage]
The streaming goes on to work as if there was no language model post processing(equivalent to lm_alpha,lm_beta =0)
I had this issue while training but we fixed it by installing the correct decoder version and i can not seem to fix this now. any help?
[INFO]
Python 3.6.8
tensorflow.version
ā1.14.0ā
ds-ctcdecoder==0.6.0a0
deepspeech-gpu==0.5.0
I do not have that error when I run it with the pretrain model folder i downloaded for the deepspeech repo.
Previously, I ran it with this requirements. I retried it just in case anyway, same results.
Iāve generated a language model with kenlm then generated the trie with the generate_trie i got after Compiling libdeepspeech.so & generate_trie.
Deepspeech version is from the mic_vad_streaming requirements. ctcdecoder is from when you fixed the same issue i had while training.
I cannot seem to find the requirements.txt. where would i find it?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
So, can you justify why you rebuilt everything ? If you do so, you need to rebuilt from the matching tag.
I gave you the path earlier, under examples/mic_vad_streaming. You cannot mix DeepSpeech 0.4.1 runtime and generate your trie file with generate_trie from 0.6.0 for example ā¦
It just might not be clear to me. Let me try and figure out whatās going on.
If i was not to rebuild everything, where and how am i to get access to the generate_trie executable? I cannot seem to find that. I might just be missing something very basic.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
8
Well, if itās not clear to you, itād be great to know what is unclear.
Now that you said this, I looked up the releases and found what I needed[I think there is no mention of this in the readmes, which is probably why a lot of people end up rebuilding the binaries(Might be common knowledge to most though)]. I regenerated the trie with the CUDA-linux native_client and the error is gone.(Testing this remotely right now, cannot tunnel microphone. will update in 8 hours or so after thorough testing). Also, I have to use deepspeech 0.5.0 instead of 0.4.1 which was in the requirements because the model doesnt load, but thats probably because of the which version i used for training.
deepspeech==0.5.0
ds-ctcdecoder==0.6.0a0
and i picked up native_clients from 0.4.1
I am not sure if this mix should be working or not, but the error seems to have vanished. Iāll keep you posted on the results!
youāre right. The problem was in certain places, like the readme in data,generate_trie was not referenced back to where it is( And it says generated from generate_trie.cpp which was why i looked up how to build it). The main readme didnt specifically have ātrieā related mentions either.
This was where it might have been helpful.
(which includes the deepspeech binary and associated libraries.)
Iāll send a PR with some updates on the docs, hope its useful.
the issue is that the readme says:
python util/taskcluster --target
and this will download generate_trie for version of DeepSpeech tag but that wonāt work.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
14
I tried to build own language model from own vocabulary.txt and use it for inference from acoustic model v0.5.0
Here is what I did:
lm.binary generation using kenlm(so far nothing to do with deepspeech)
to generate trie (and not to have to compile everything) I used taskcluster.py
See: https://github.com/mozilla/DeepSpeech/blob/master/USING.rst#using-the-command-line-client which says: python util/taskcluster.py --target .
I had checked out v0.5.0 branch.
then:
./generate_trie alphabet.txt /path-to-own/lm.binary ./output-path/trie
then inference with deepspeech (pip install deepspeech) will result in error: trie mismatch, but continue with inference and so the generated lm.binary will not be used
But, if you use the generate_trie file packed in native_client.tar.gz, like you have said so in this thread, everything will work fine.
So, I think either Iām missing something here, or README.md needs to be updated or taskcluster.py is to blame
OK, I will try to change readme or change taskcluster to make it more explicitly visible.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
18
Well, I have spotted a few people recently struggling with that. I would have expected people to use --help and sort it out, but it looks like it is not the case. Maybe we should change the behavior and pull from current matching tag?
yeah I had not checked the help, my bad. but to default to master is also strange behavior, if I checkout a branch, Iād expect all other scripts to follow that tag.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
20
Well, I think you are the one who filed #2418 and this is actually now fixed on master