Trie file version mismatch (3 instead of expected 2). Update your trie file

Hello, I am trying to train deepspeech on my own small dataset.

I installed DeepSpeech this way:

~$ git clone https://github.com/mozilla/DeepSpeech
~$ cd DeepSpeech
~/DeepSpeech$ pip3 install --user -r requirements.txt
~/DeepSpeech$ pip3 uninstall tensorflow -y
~/DeepSpeech$ pip3 install --user ‘tensorflow-gpu==1.12.0rc2’

And I am creating a new language model:

~/kenlm/build$ ./bin/lmplz --order 5 --text ~/lm/vocab.txt --arpa ~/lm/lm.arpa
~/kenlm/build$ ./bin/build_binary -a 255 -q 8 trie ~/lm/lm.arpa ~/lm/lm.binary

To create trie file:

~/DeepSpeech$ cat VERSION
0.4.0-alpha.0

~/DeepSpeech$ python3 util/taskcluster.py --branch “v0.4.0-alpha.0” --arch gpu --target .
~/DeepSpeech$ ./generate_trie ~/lm/alphabet.txt ~/lm/lm.binary ~/lm/trie

At last, when I run this:

python3 -u DeepSpeech.py
–train_files “$data_dir”/test.csv
–dev_files “$data_dir”/test.csv
–test_files “$data_dir”/test.csv
–train_batch_size 1
–dev_batch_size 1
–test_batch_size 1
–n_hidden 494
–epoch 75
–checkpoint_dir “$checkpoint_dir”
–decoder_library_path libctc_decoder_with_kenlm.so
–alphabet_config_path “$lm_dir”/alphabet.txt
–lm_binary_path “$lm_dir”/lm.binary
–lm_trie_path “$lm_dir”/trie
“$@”

I get this error:

I Training epoch 74…
100% (1 of 1) |##########################################################################################################| Elapsed Time: 0:00:00 ETA: 00:00:00Error: Trie file version mismatch (3 instead of expected 2). Update your trie file.
I Training of Epoch 74 - loss: 322.916992
I FINISHED Optimization - training time: 0:00:25

*I also tried running util/taskcluster.py without --branch option, but the result was same.

Any ideas how to solve this problem?
If you need more information about something else, please let me know!

For training for now, you cannot use what is generated by generate_trie from master, try v0.3.0

Ok, I just switched to v0.3.0 and succesfully created trie file.

But when I try to train, I get this error:

Traceback (most recent call last):
File “DeepSpeech.py”, line 1988, in
tf.app.run(main)
File “/home/s0d/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1939, in main
initialize_globals()
File “DeepSpeech.py”, line 334, in initialize_globals
custom_op_module = tf.load_op_library(FLAGS.decoder_library_path)
File “/home/s0d/.local/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py”, line 60, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: native_client/libctc_decoder_with_kenlm.so: undefined symbol: _ZN10tensorflow12OpDefBuilder6OutputEN4absl11string_viewE

I am currently using CUDA-9.0, tensorflow-gpu==1.12.0rc2.
Is it related to TF version? Should I use older version of TF?

Likely your libctc_decoder_with_kenlm.so is not matching the tensorflow version, in your case you need the one from v0.4.0-alpha.0

Ah, I see.

I removed old files extracted from native_client.tar.xz, then used python3 util/taskcluster.py --branch “v0.4.0-alpha.0” --arch gpu --target ./native_client command to get libctc_decoder_with_kenlm.so for v0.4.0-alpha.0. Now it works!

Many thanks.

1 Like

Yeah, the situation is a bit complicated because we are in between with two ctc decoders. New ctcdecoder code should land on the remaining training parts soon now, avoiding too much headaches.

2 Likes

If we do have the new ctc decoder (pip3 install $(python3 util/taskcluster.py --decoder)) how do we point deepspeech to use the new one?

(deepspeech-env) Abhay@deepspeech-node4-vm:/opt/deepspeech/Abhay/deepspeech-git/DeepSpeech$ cat VERSION
0.4.0-alpha.0

I removed native_client folder as well as native_client.tar.xz.

then used
python3 util/taskcluster.py --branch “v0.4.0-alpha.0” --arch gpu --target ./native_client

But It gave an error :frowning:

(deepspeech-env) Abhay@deepspeech-node4-vm:/opt/deepspeech/Abhay/deepspeech-git/DeepSpeech$ python3 util/taskcluster.py --branch “v0.4.0-alpha.0” --arch gpu --target ./
Downloading https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.“v0.4.0-alpha.0”.gpu/artifacts/public/native_client.tar.xz
Traceback (most recent call last):
File “util/taskcluster.py”, line 141, in
maybe_download_tc(target_dir=args.target, tc_url=get_tc_url(args.arch, args.artifact, args.branch))
File “util/taskcluster.py”, line 55, in maybe_download_tc
urllib.request.urlretrieve(tc_url, target_file, reporthook=(report_progress if progress else None))
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 223, in urlopen
return opener.open(url, data, timeout)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 526, in open
response = self._open(req, data)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 544, in _open
‘_open’, req)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 504, in _call_chain
result = func(*args)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/urllib/request.py”, line 1318, in do_open
encode_chunked=req.has_header(‘Transfer-encoding’))
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/http/client.py”, line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/http/client.py”, line 1250, in _send_request
self.putrequest(method, url, **skips)
File “/home/Abhay/anaconda3/envs/deepspeech-env/lib/python3.6/http/client.py”, line 1117, in putrequest
self._output(request.encode(‘ascii’))
UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘\u201c’ in position 57: ordinal not in range(128)

there is nothing specific to do, it will be used

Please use proper code formatting when copying output like that, otherwise it’s hard to read …

Wait … You are copying wrong characters … Please make sure you have doubles quotes, not their UTF-8 equivalent: python3 util/taskcluster.py --branch "v0.4.0-alpha.0" --arch gpu --target ./native_client

1 Like

Yeah. Earlier they had a problem with -- being converted to . I think some text editor or terminal is converting characters automatically and screwing with command lines.

Thanks It worked. I don’t know where I was wrong.

copy/paste included quotes that were not the expected one by Python’s code

1 Like