Trie file version mismatch (3 instead of expected 2). Update your trie file


(s0d) #1

Hello, I am trying to train deepspeech on my own small dataset.

I installed DeepSpeech this way:

~$ git clone https://github.com/mozilla/DeepSpeech
~$ cd DeepSpeech
~/DeepSpeech$ pip3 install --user -r requirements.txt
~/DeepSpeech$ pip3 uninstall tensorflow -y
~/DeepSpeech$ pip3 install --user ‘tensorflow-gpu==1.12.0rc2’

And I am creating a new language model:

~/kenlm/build$ ./bin/lmplz --order 5 --text ~/lm/vocab.txt --arpa ~/lm/lm.arpa
~/kenlm/build$ ./bin/build_binary -a 255 -q 8 trie ~/lm/lm.arpa ~/lm/lm.binary

To create trie file:

~/DeepSpeech$ cat VERSION
0.4.0-alpha.0

~/DeepSpeech$ python3 util/taskcluster.py --branch “v0.4.0-alpha.0” --arch gpu --target .
~/DeepSpeech$ ./generate_trie ~/lm/alphabet.txt ~/lm/lm.binary ~/lm/trie

At last, when I run this:

python3 -u DeepSpeech.py
–train_files “$data_dir”/test.csv
–dev_files “$data_dir”/test.csv
–test_files “$data_dir”/test.csv
–train_batch_size 1
–dev_batch_size 1
–test_batch_size 1
–n_hidden 494
–epoch 75
–checkpoint_dir “$checkpoint_dir”
–decoder_library_path libctc_decoder_with_kenlm.so
–alphabet_config_path “$lm_dir”/alphabet.txt
–lm_binary_path “$lm_dir”/lm.binary
–lm_trie_path “$lm_dir”/trie
“$@”

I get this error:

I Training epoch 74…
100% (1 of 1) |##########################################################################################################| Elapsed Time: 0:00:00 ETA: 00:00:00Error: Trie file version mismatch (3 instead of expected 2). Update your trie file.
I Training of Epoch 74 - loss: 322.916992
I FINISHED Optimization - training time: 0:00:25

*I also tried running util/taskcluster.py without --branch option, but the result was same.

Any ideas how to solve this problem?
If you need more information about something else, please let me know!


(Lissyx) #2

For training for now, you cannot use what is generated by generate_trie from master, try v0.3.0


(s0d) #3

Ok, I just switched to v0.3.0 and succesfully created trie file.

But when I try to train, I get this error:

Traceback (most recent call last):
File “DeepSpeech.py”, line 1988, in
tf.app.run(main)
File “/home/s0d/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1939, in main
initialize_globals()
File “DeepSpeech.py”, line 334, in initialize_globals
custom_op_module = tf.load_op_library(FLAGS.decoder_library_path)
File “/home/s0d/.local/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py”, line 60, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: native_client/libctc_decoder_with_kenlm.so: undefined symbol: _ZN10tensorflow12OpDefBuilder6OutputEN4absl11string_viewE

I am currently using CUDA-9.0, tensorflow-gpu==1.12.0rc2.
Is it related to TF version? Should I use older version of TF?


(Lissyx) #4

Likely your libctc_decoder_with_kenlm.so is not matching the tensorflow version, in your case you need the one from v0.4.0-alpha.0


(s0d) #5

Ah, I see.

I removed old files extracted from native_client.tar.xz, then used python3 util/taskcluster.py --branch “v0.4.0-alpha.0” --arch gpu --target ./native_client command to get libctc_decoder_with_kenlm.so for v0.4.0-alpha.0. Now it works!

Many thanks.


(Lissyx) #6

Yeah, the situation is a bit complicated because we are in between with two ctc decoders. New ctcdecoder code should land on the remaining training parts soon now, avoiding too much headaches.


(Abby) #7

If we do have the new ctc decoder (pip3 install $(python3 util/taskcluster.py --decoder)) how do we point deepspeech to use the new one?


(Abby) #8

(deepspeech-env) Abhay@deepspeech-node4-vm:/opt/deepspeech/Abhay/deepspeech-git/DeepSpeech$ cat VERSION
0.4.0-alpha.0

I removed native_client folder as well as native_client.tar.xz.

then used
python3 util/taskcluster.py --branch “v0.4.0-alpha.0” --arch gpu --target ./native_client

But It gave an error :frowning:

(deepspeech-env) Abhay@deepspeech-node4-vm:/opt/deepspeech/Abhay/deepspeech-git/DeepSpeech$ python3 util/taskcluster.py --branch “v0.4.0-alpha.0” --arch gpu --target ./
Downloading https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.“v0.4.0-alpha.0”.gpu/artifacts/public/native_client.tar.xz
Traceback (most recent call last):
File “util/taskcluster.py”, line 141, in
maybe_download_tc(target_dir=args.target, tc_url=get_tc_url(args.arch, args.artifact, args.branch))
File “util/taskcluster.py”, line 55, in maybe_download_tc
urllib.request.urlretrieve(tc_url, target_file, reporthook=(report_progress if progress else None))
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 223, in urlopen
return opener.open(url, data, timeout)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 526, in open
response = self._open(req, data)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 544, in _open
‘_open’, req)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 504, in _call_chain
result = func(*args)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 1318, in do_open
encode_chunked=req.has_header(‘Transfer-encoding’))
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/http/client.py”, line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/http/client.py”, line 1250, in _send_request
self.putrequest(method, url, **skips)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/http/client.py”, line 1117, in putrequest
self._output(request.encode(‘ascii’))
UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘\u201c’ in position 57: ordinal not in range(128)


(Lissyx) #9

there is nothing specific to do, it will be used


(Lissyx) #10

Please use proper code formatting when copying output like that, otherwise it’s hard to read …


(Lissyx) #11

Wait … You are copying wrong characters … Please make sure you have doubles quotes, not their UTF-8 equivalent: python3 util/taskcluster.py --branch "v0.4.0-alpha.0" --arch gpu --target ./native_client


(Reuben Morais) #12

Yeah. Earlier they had a problem with -- being converted to . I think some text editor or terminal is converting characters automatically and screwing with command lines.


(Abby) #13

Thanks It worked. I don’t know where I was wrong.


(Lissyx) #14

copy/paste included quotes that were not the expected one by Python’s code