Terminate called after throwing an instance of 'lm::FormatLoadException'

I get the following error

terminate called after throwing an instance of ‘lm::FormatLoadException’
what(): native_client/kenlm/lm/binary_format.cc:131 in void lm::ngram::MatchCheck(lm::ngram::ModelType, unsigned int, const lm::ngram::Parameters&) threw FormatLoadException.
The binary file was built for probing hash tables but the inference code is trying to load trie with quantization and array-compressed pointers

the error appears when the model is being trained, I do not know what to do, I hope you can help me

I attached the error image

It looks like you are using a v0.1.0 or v0.1.1 language model with v0.2.0 code.

So either you have to switch to a v0.2.0 language model or to v0.1.0 or v0.1.1 code.

I downloaded this version of deepspeech

https://github.com/mozilla/DeepSpeech.git

and I am creating my own language model in Spanish with kenlm, the version of kenlm is

https://github.com/kpu/kenlm.git

and download the native_client version of a link

https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.1.1.cpu/artifacts/public/native_client.tar.xz

I think the previous one is the problem that is from version 0.1.1, I had to download this directory because the generate_trie file that contains the deepspeech project, generates the same error when trying to generate the trie, at first replace all the folder native_client that is in the project deepspeech, I had no problem with generating the trie, but after starting the training the file libctc_decoder_with_kenlm.so, I generate the following error: tensorflow.python.framework.errors_impl.NotFoundError: / home / manuel / Downloads / native_client / libctc_decoder_with_kenlm.so: undefined symbol: _ZN10tensorflow6StatusC1ENS_5error4CodeENS_11StringPieceE, then use the libctc_decoder_with_kenlm.so that contained the native_client folder of the deepspeech project and everything worked, but then the error is generated in the title of this topic. With this you could know what error I have?

No, you are jumping from one error to another with incomplete STRs. Please clearly document the issue you are facing: as already explained, your issue in the title is because you are mixing incompatible releases components between 0.2.0 and 0.1.0/0.1.1.

Again, this is documented and this is because you are using libctc_decoder_with_kenlm.so with an incompatible version of TensorFlow. Please train using the documented requirements.txt.

perform another test, download again:

https://github.com/mozilla/DeepSpeech

but with this version of native_client

https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.2.0-alpha.9.cpu/artifacts/public/native_client.tar.xz

I would assume that I am already using versions 0.2.0 in both deepspeech, native_client and with the installation of pip install deepspeech

even so the libctc_decoder_with_kenlm.so is not recognized by tensorflow
que fue el que descargue de aqui

https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.2.0-alpha.9.cpu/artifacts/public/native_client.tar.xz

the version of kenlm, do you have something to do ?, I must always download this version:

https://github.com/kpu/kenlm.git

to be able to compile it, the one that comes by default in deepspeech does not compile

Could not you give me the versions links or something, please, you would be very helpful

with respect to this part

Again, this is documented and this is because you are using libctc_decoder_with_kenlm.so with an incompatible version of TensorFlow. Please train using the documented requirements.txt .

already install all the requirements, including tensorflow1.11.0, which according to what is needed

But 0.2.0 is based on TensorFlow r1.6: https://github.com/mozilla/DeepSpeech/blob/v0.2.0/requirements.txt#L4

Don’t assume, ensure. Also you link is 0.2.0-alpha.9, not 0.2.0.

In this link
https://github.com/mozilla/DeepSpeech

in the document requirements.txt

https://github.com/mozilla/DeepSpeech/blob/master/requirements.txt

says that the tensorflow version is 1.11.0

you link is 0.2.0-alpha.9, not 0.2.0

I will make sure to correct this and I will tell you if I could solve it

And this is for current master, not v0.2.0.

use the link you gave me

downloaded that version of deepspeech and downloaded this version of native_client

https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.2.0.cpu/artifacts/public/native_client.tar.xz

I also executed the following command
sudo pip3 install deepspeech

who got the next result

Requirement already satisfied: deepspeech in /home/manuel/.local/lib/python3.6/site-packages (0.2.0)

version 0.2.0 is installed, right?

I do not know if this is necessary

now if I’m sure I’m using version 0.2.0 of deepspeech and native_client, I’m also using tensorflow 1.6, as is required in this version and still when generating the trie, the following error appears

terminate called after throwing an instance of ‘lm::FormatLoadException’
what(): native_client/kenlm/lm/binary_format.cc:131 in void lm::ngram::MatchCheck(lm::ngram::ModelType, unsigned int, const lm::ngram::Parameters&) threw FormatLoadException.
The binary file was built for probing hash tables but the inference code is trying to load trie with quantization and array-compressed pointers
Abortado (`core’ generado)

Ok, please show your STR completely, from downloading everything to running generate_trie.

here I try to compile the version of kenlm that brings deepspeech by default, but the following error comes out

then I download the kenlm repository, and compile it

the kenlm compilation process ends well.

Now, I create the .arpa file

then I create the .binary file, and then I try to create the trie file, and the error is generated

Does kenlm’s version have anything to do with, or cmake’s, or any other of the necessary tools

additionally I check that the file libctc_decoder_with_kenlm.so, is recognized by tensorflow, and I also verify the version of tensorflow

files.zip (621,3 KB)

additionally I add the CSV files of the training model, the alphabet and the vocabulary, I do not know if it is necessary but I still uploaded them. : D

I hope you can help me. Thank you

I’m sorry, but screenshots are painful to debug from and they are hard to search to help others. Please share text content.

Or it has to do with the fact that you have no read the documentation in data/lm/README.md ?

first I download the deepspeech from this link:

then I download the native_client from this:
https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.2.0.cpu/artifacts/public/native_client.tar.xz

I unzip the 2 files, and I put the native_client folder inside the folder that is generated by deepspeech, I give it in combine folder

I enter the native_client folder within deepspeech, and I try to compile the kenlm version within it, as follows:

mkdir build
cd build/
cmake …

it appears that:
`-- The C compiler identification is GNU 7.3.0
– The CXX compiler identification is GNU 7.3.0
– Check for working C compiler: /usr/bin/cc
– Check for working C compiler: /usr/bin/cc – works
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Detecting C compile features
– Detecting C compile features - done
– Check for working CXX compiler: /usr/bin/c++
– Check for working CXX compiler: /usr/bin/c++ – works
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Detecting CXX compile features
– Detecting CXX compile features - done
– Looking for pthread.h
– Looking for pthread.h - found
– Looking for pthread_create
– Looking for pthread_create - not found
– Looking for pthread_create in pthreads
– Looking for pthread_create in pthreads - not found
– Looking for pthread_create in pthread
– Looking for pthread_create in pthread - found
– Found Threads: TRUE
– Boost version: 1.65.1
– Found the following Boost libraries:
– program_options
– system
– thread
– unit_test_framework
– chrono
– date_time
– atomic
– Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version “1.2.11”)
– Found BZip2: /usr/lib/x86_64-linux-gnu/libbz2.so (found version “1.0.6”)
– Looking for BZ2_bzCompressInit
– Looking for BZ2_bzCompressInit - found
– Looking for lzma_auto_decoder in /usr/lib/x86_64-linux-gnu/liblzma.so
– Looking for lzma_auto_decoder in /usr/lib/x86_64-linux-gnu/liblzma.so - found
– Looking for lzma_easy_encoder in /usr/lib/x86_64-linux-gnu/liblzma.so
– Looking for lzma_easy_encoder in /usr/lib/x86_64-linux-gnu/liblzma.so - found
– Looking for lzma_lzma_preset in /usr/lib/x86_64-linux-gnu/liblzma.so
– Looking for lzma_lzma_preset in /usr/lib/x86_64-linux-gnu/liblzma.so - found
– Found LibLZMA: /usr/include (found version “5.2.2”)
CMake Error at util/CMakeLists.txt:58 (add_subdirectory):
add_subdirectory given source “stream” which is not an existing directory.

CMake Error at lm/CMakeLists.txt:44 (add_subdirectory):
add_subdirectory given source “builder” which is not an existing directory.

CMake Error at lm/CMakeLists.txt:45 (add_subdirectory):
add_subdirectory given source “filter” which is not an existing directory.

– Could NOT find Eigen3 (missing: EIGEN3_INCLUDE_DIR EIGEN3_VERSION_OK) (Required is at least version “2.91.0”)
CMake Warning at lm/interpolate/CMakeLists.txt:65 (message):
Not building interpolation. Eigen3 was not found.

– To install Eigen3 in your home directory, copy paste this:
export EIGEN3_ROOT=$HOME/eigen-eigen-07105f7124f9
(cd $HOME; wget -O - https://bitbucket.org/eigen/eigen/get/3.2.8.tar.bz2 |tar xj)
rm CMakeCache.txt

– Configuring incomplete, errors occurred!
See also “/home/manuel/Descargas/version2/DeepSpeech-0.2.0/native_client/kenlm/build/CMakeFiles/CMakeOutput.log”.
See also “/home/manuel/Descargas/version2/DeepSpeech-0.2.0/native_client/kenlm/build/CMakeFiles/CMakeError.log”.
`

then I delete the folder kenlm, which is contained by default in native_client of deepspech and I download the kenlm repository that is this https://github.com/kpu/kenlm.git, then I do the same:

mkdir build
cd build/
cmake …

now it appears:

`-- The C compiler identification is GNU 7.3.0
– The CXX compiler identification is GNU 7.3.0
– Check for working C compiler: /usr/bin/cc
– Check for working C compiler: /usr/bin/cc – works
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Detecting C compile features
– Detecting C compile features - done
– Check for working CXX compiler: /usr/bin/c++
– Check for working CXX compiler: /usr/bin/c++ – works
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Detecting CXX compile features
– Detecting CXX compile features - done
– Looking for pthread.h
– Looking for pthread.h - found
– Looking for pthread_create
– Looking for pthread_create - not found
– Looking for pthread_create in pthreads
– Looking for pthread_create in pthreads - not found
– Looking for pthread_create in pthread
– Looking for pthread_create in pthread - found
– Found Threads: TRUE
– Boost version: 1.65.1
– Found the following Boost libraries:
– program_options
– system
– thread
– unit_test_framework
– chrono
– date_time
– atomic
– Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version “1.2.11”)
– Found BZip2: /usr/lib/x86_64-linux-gnu/libbz2.so (found version “1.0.6”)
– Looking for BZ2_bzCompressInit
– Looking for BZ2_bzCompressInit - found
– Looking for lzma_auto_decoder in /usr/lib/x86_64-linux-gnu/liblzma.so
– Looking for lzma_auto_decoder in /usr/lib/x86_64-linux-gnu/liblzma.so - found
– Looking for lzma_easy_encoder in /usr/lib/x86_64-linux-gnu/liblzma.so
– Looking for lzma_easy_encoder in /usr/lib/x86_64-linux-gnu/liblzma.so - found
– Looking for lzma_lzma_preset in /usr/lib/x86_64-linux-gnu/liblzma.so
– Looking for lzma_lzma_preset in /usr/lib/x86_64-linux-gnu/liblzma.so - found
– Found LibLZMA: /usr/include (found version “5.2.2”)
– Could NOT find Eigen3 (missing: EIGEN3_INCLUDE_DIR EIGEN3_VERSION_OK) (Required is at least version “2.91.0”)
CMake Warning at lm/interpolate/CMakeLists.txt:65 (message):
Not building interpolation. Eigen3 was not found.

– To install Eigen3 in your home directory, copy paste this:
export EIGEN3_ROOT=$HOME/eigen-eigen-07105f7124f9
(cd $HOME; wget -O - https://bitbucket.org/eigen/eigen/get/3.2.8.tar.bz2 |tar xj)
rm CMakeCache.txt

– Configuring done
– Generating done
– Build files have been written to: /home/manuel/Descargas/version2/DeepSpeech-0.2.0/native_client/kenlm/build`

then:

make -j 4

Scanning dependencies of target kenlm_filter Scanning dependencies of target kenlm_util [ 0%] Building CXX object lm/filter/CMakeFiles/kenlm_filter.dir/arpa_io.cc.o [ 1%] Building CXX object lm/filter/CMakeFiles/kenlm_filter.dir/phrase.cc.o [ 2%] Building CXX object lm/filter/CMakeFiles/kenlm_filter.dir/vocab.cc.o [ 3%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/bignum-dtoa.cc.o [ 3%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/bignum.cc.o [ 4%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/cached-powers.cc.o [ 5%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/diy-fp.cc.o [ 6%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/double-conversion.cc.o [ 7%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/fast-dtoa.cc.o [ 8%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/fixed-dtoa.cc.o [ 8%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/strtod.cc.o [ 9%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/chain.cc.o [ 10%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/count_records.cc.o [ 11%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/io.cc.o [ 12%] Linking CXX static library ../../lib/libkenlm_filter.a [ 12%] Built target kenlm_filter [ 13%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/line_input.cc.o [ 13%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/multi_progress.cc.o [ 14%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/rewindable_stream.cc.o [ 15%] Building CXX object util/CMakeFiles/kenlm_util.dir/bit_packing.cc.o [ 16%] Building CXX object util/CMakeFiles/kenlm_util.dir/ersatz_progress.cc.o [ 17%] Building CXX object util/CMakeFiles/kenlm_util.dir/exception.cc.o [ 17%] Building CXX object util/CMakeFiles/kenlm_util.dir/file.cc.o [ 18%] Building CXX object util/CMakeFiles/kenlm_util.dir/file_piece.cc.o [ 19%] Building CXX object util/CMakeFiles/kenlm_util.dir/float_to_string.cc.o [ 20%] Building CXX object util/CMakeFiles/kenlm_util.dir/integer_to_string.cc.o [ 21%] Building CXX object util/CMakeFiles/kenlm_util.dir/mmap.cc.o [ 21%] Building CXX object util/CMakeFiles/kenlm_util.dir/murmur_hash.cc.o [ 22%] Building CXX object util/CMakeFiles/kenlm_util.dir/parallel_read.cc.o [ 23%] Building CXX object util/CMakeFiles/kenlm_util.dir/pool.cc.o [ 24%] Building CXX object util/CMakeFiles/kenlm_util.dir/read_compressed.cc.o [ 25%] Building CXX object util/CMakeFiles/kenlm_util.dir/scoped.cc.o [ 25%] Building CXX object util/CMakeFiles/kenlm_util.dir/spaces.cc.o [ 26%] Building CXX object util/CMakeFiles/kenlm_util.dir/string_piece.cc.o [ 27%] Building CXX object util/CMakeFiles/kenlm_util.dir/usage.cc.o [ 28%] Linking CXX static library ../lib/libkenlm_util.a [ 28%] Built target kenlm_util Scanning dependencies of target sized_iterator_test Scanning dependencies of target bit_packing_test Scanning dependencies of target string_stream_test Scanning dependencies of target joint_sort_test [ 29%] Building CXX object util/CMakeFiles/sized_iterator_test.dir/sized_iterator_test.cc.o [ 30%] Building CXX object util/CMakeFiles/joint_sort_test.dir/joint_sort_test.cc.o [ 30%] Building CXX object util/CMakeFiles/string_stream_test.dir/string_stream_test.cc.o [ 31%] Building CXX object util/CMakeFiles/bit_packing_test.dir/bit_packing_test.cc.o [ 31%] Linking CXX executable ../tests/sized_iterator_test [ 31%] Built target sized_iterator_test Scanning dependencies of target file_piece_test [ 31%] Building CXX object util/CMakeFiles/file_piece_test.dir/file_piece_test.cc.o [ 32%] Linking CXX executable ../tests/bit_packing_test [ 32%] Built target bit_packing_test Scanning dependencies of target sorted_uniform_test [ 33%] Building CXX object util/CMakeFiles/sorted_uniform_test.dir/sorted_uniform_test.cc.o [ 34%] Linking CXX executable ../tests/joint_sort_test [ 34%] Built target joint_sort_test Scanning dependencies of target probing_hash_table_benchmark [ 35%] Building CXX object util/CMakeFiles/probing_hash_table_benchmark.dir/probing_hash_table_benchmark_main.cc.o [ 36%] Linking CXX executable ../tests/string_stream_test [ 36%] Built target string_stream_test Scanning dependencies of target pcqueue_test [ 36%] Building CXX object util/CMakeFiles/pcqueue_test.dir/pcqueue_test.cc.o [ 37%] Linking CXX executable ../tests/file_piece_test [ 38%] Linking CXX executable ../tests/pcqueue_test [ 38%] Built target file_piece_test Scanning dependencies of target tokenize_piece_test [ 39%] Building CXX object util/CMakeFiles/tokenize_piece_test.dir/tokenize_piece_test.cc.o [ 39%] Built target pcqueue_test Scanning dependencies of target probing_hash_table_test [ 40%] Building CXX object util/CMakeFiles/probing_hash_table_test.dir/probing_hash_table_test.cc.o [ 41%] Linking CXX executable ../tests/sorted_uniform_test [ 41%] Built target sorted_uniform_test Scanning dependencies of target read_compressed_test [ 41%] Building CXX object util/CMakeFiles/read_compressed_test.dir/read_compressed_test.cc.o [ 41%] Linking CXX executable ../bin/probing_hash_table_benchmark [ 41%] Built target probing_hash_table_benchmark Scanning dependencies of target multi_intersection_test [ 42%] Building CXX object util/CMakeFiles/multi_intersection_test.dir/multi_intersection_test.cc.o [ 43%] Linking CXX executable ../tests/probing_hash_table_test [ 43%] Built target probing_hash_table_test Scanning dependencies of target integer_to_string_test [ 44%] Building CXX object util/CMakeFiles/integer_to_string_test.dir/integer_to_string_test.cc.o [ 45%] Linking CXX executable ../tests/tokenize_piece_test [ 45%] Built target tokenize_piece_test Scanning dependencies of target io_test [ 46%] Building CXX object util/stream/CMakeFiles/io_test.dir/io_test.cc.o [ 47%] Linking CXX executable ../tests/read_compressed_test [ 47%] Built target read_compressed_test Scanning dependencies of target sort_test [ 48%] Building CXX object util/stream/CMakeFiles/sort_test.dir/sort_test.cc.o [ 49%] Linking CXX executable ../tests/multi_intersection_test [ 49%] Built target multi_intersection_test Scanning dependencies of target stream_test [ 49%] Building CXX object util/stream/CMakeFiles/stream_test.dir/stream_test.cc.o [ 50%] Linking CXX executable ../../tests/io_test [ 50%] Built target io_test Scanning dependencies of target rewindable_stream_test [ 51%] Building CXX object util/stream/CMakeFiles/rewindable_stream_test.dir/rewindable_stream_test.cc.o [ 52%] Linking CXX executable ../tests/integer_to_string_test [ 52%] Built target integer_to_string_test Scanning dependencies of target kenlm [ 53%] Building CXX object lm/CMakeFiles/kenlm.dir/bhiksha.cc.o [ 54%] Building CXX object lm/CMakeFiles/kenlm.dir/binary_format.cc.o [ 55%] Building CXX object lm/CMakeFiles/kenlm.dir/config.cc.o [ 55%] Building CXX object lm/CMakeFiles/kenlm.dir/lm_exception.cc.o [ 56%] Building CXX object lm/CMakeFiles/kenlm.dir/model.cc.o [ 57%] Linking CXX executable ../../tests/stream_test [ 57%] Built target stream_test [ 58%] Building CXX object lm/CMakeFiles/kenlm.dir/quantize.cc.o [ 59%] Linking CXX executable ../../tests/rewindable_stream_test [ 59%] Built target rewindable_stream_test [ 60%] Building CXX object lm/CMakeFiles/kenlm.dir/read_arpa.cc.o [ 61%] Building CXX object lm/CMakeFiles/kenlm.dir/search_hashed.cc.o [ 62%] Linking CXX executable ../../tests/sort_test [ 62%] Built target sort_test [ 63%] Building CXX object lm/CMakeFiles/kenlm.dir/search_trie.cc.o [ 63%] Building CXX object lm/CMakeFiles/kenlm.dir/sizes.cc.o [ 64%] Building CXX object lm/CMakeFiles/kenlm.dir/trie.cc.o [ 65%] Building CXX object lm/CMakeFiles/kenlm.dir/trie_sort.cc.o [ 66%] Building CXX object lm/CMakeFiles/kenlm.dir/value_build.cc.o [ 67%] Building CXX object lm/CMakeFiles/kenlm.dir/virtual_interface.cc.o [ 67%] Building CXX object lm/CMakeFiles/kenlm.dir/vocab.cc.o [ 68%] Building CXX object lm/CMakeFiles/kenlm.dir/common/model_buffer.cc.o [ 69%] Building CXX object lm/CMakeFiles/kenlm.dir/common/print.cc.o [ 70%] Building CXX object lm/CMakeFiles/kenlm.dir/common/renumber.cc.o [ 71%] Building CXX object lm/CMakeFiles/kenlm.dir/common/size_option.cc.o [ 71%] Linking CXX static library ../lib/libkenlm.a [ 71%] Built target kenlm Scanning dependencies of target fragment Scanning dependencies of target query Scanning dependencies of target partial_test Scanning dependencies of target model_test [ 73%] Building CXX object lm/CMakeFiles/query.dir/query_main.cc.o [ 73%] Building CXX object lm/CMakeFiles/fragment.dir/fragment_main.cc.o [ 74%] Building CXX object lm/CMakeFiles/model_test.dir/model_test.cc.o [ 75%] Building CXX object lm/CMakeFiles/partial_test.dir/partial_test.cc.o [ 75%] Linking CXX executable ../bin/fragment [ 75%] Built target fragment Scanning dependencies of target left_test [ 76%] Building CXX object lm/CMakeFiles/left_test.dir/left_test.cc.o [ 77%] Linking CXX executable ../bin/query [ 77%] Built target query Scanning dependencies of target build_binary [ 78%] Building CXX object lm/CMakeFiles/build_binary.dir/build_binary_main.cc.o [ 78%] Linking CXX executable ../bin/build_binary [ 78%] Built target build_binary Scanning dependencies of target kenlm_benchmark [ 79%] Building CXX object lm/CMakeFiles/kenlm_benchmark.dir/kenlm_benchmark_main.cc.o [ 80%] Linking CXX executable ../tests/partial_test [ 80%] Built target partial_test Scanning dependencies of target model_buffer_test [ 81%] Building CXX object lm/common/CMakeFiles/model_buffer_test.dir/model_buffer_test.cc.o [ 82%] Linking CXX executable ../../tests/model_buffer_test [ 82%] Built target model_buffer_test Scanning dependencies of target kenlm_builder [ 83%] Building CXX object lm/builder/CMakeFiles/kenlm_builder.dir/adjust_counts.cc.o [ 83%] Linking CXX executable ../tests/left_test [ 83%] Built target left_test Scanning dependencies of target filter [ 84%] Building CXX object lm/filter/CMakeFiles/filter.dir/filter_main.cc.o [ 85%] Building CXX object lm/builder/CMakeFiles/kenlm_builder.dir/corpus_count.cc.o [ 85%] Building CXX object lm/builder/CMakeFiles/kenlm_builder.dir/initial_probabilities.cc.o [ 86%] Linking CXX executable ../bin/kenlm_benchmark [ 86%] Built target kenlm_benchmark Scanning dependencies of target phrase_table_vocab [ 87%] Building CXX object lm/filter/CMakeFiles/phrase_table_vocab.dir/phrase_table_vocab_main.cc.o [ 88%] Building CXX object lm/builder/CMakeFiles/kenlm_builder.dir/interpolate.cc.o [ 88%] Linking CXX executable ../tests/model_test [ 88%] Built target model_test [ 89%] Building CXX object lm/builder/CMakeFiles/kenlm_builder.dir/output.cc.o [ 90%] Linking CXX executable ../../bin/phrase_table_vocab [ 90%] Built target phrase_table_vocab [ 91%] Building CXX object lm/builder/CMakeFiles/kenlm_builder.dir/pipeline.cc.o [ 92%] Linking CXX executable ../../bin/filter [ 92%] Built target filter [ 93%] Linking CXX static library ../../lib/libkenlm_builder.a [ 93%] Built target kenlm_builder Scanning dependencies of target lmplz Scanning dependencies of target corpus_count_test Scanning dependencies of target count_ngrams Scanning dependencies of target adjust_counts_test [ 93%] Building CXX object lm/builder/CMakeFiles/adjust_counts_test.dir/adjust_counts_test.cc.o [ 94%] Building CXX object lm/builder/CMakeFiles/count_ngrams.dir/count_ngrams_main.cc.o [ 95%] Building CXX object lm/builder/CMakeFiles/lmplz.dir/lmplz_main.cc.o [ 96%] Building CXX object lm/builder/CMakeFiles/corpus_count_test.dir/corpus_count_test.cc.o [ 97%] Linking CXX executable ../../tests/adjust_counts_test [ 97%] Built target adjust_counts_test [ 98%] Linking CXX executable ../../bin/lmplz [ 99%] Linking CXX executable ../../tests/corpus_count_test [ 99%] Built target corpus_count_test [ 99%] Built target lmplz [100%] Linking CXX executable ../../bin/count_ngrams [100%] Built target count_ngrams

ends successfully in my opinion

I copy my corpus (it’s very small, it contains only 10 .wav audio files with the CSV files attached to it)

files.zip (621,3 KB)

then being in the data folder of deepspeech I do the following:

I create the .arpa file

/home/manuel/Descargas/version2/DeepSpeech-0.2.0/native_client/kenlm/build/bin/./lmplz --text vocabulary.txt --arpa words.arpa --o 3

and I get this:

`=== 1/5 Counting and sorting n-grams ===
Reading /home/manuel/Descargas/version2/DeepSpeech-0.2.0/data/vocabulary.txt
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100


Unigram tokens 309281 types 24737
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:296844 2:1710177792 3:3206583552
Statistics:
1 24737 D1=0.629506 D2=1.08051 D3+=1.51425
2 147409 D1=0.801094 D2=1.1342 D3+=1.40326
3 255902 D1=0.896885 D2=1.21438 D3+=1.40798
Memory estimate for binary LM:
type kB
probing 8581 assuming -p 1.5
probing 9541 assuming -r models -p 1.5
trie 3744 without quantization
trie 2183 assuming -q 8 -b 8 quantization
trie 3559 assuming -a 22 array pointer compression
trie 1998 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:296844 2:2358544 3:5118040
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
####################################################################################################
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:296844 2:2358544 3:5118040
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
####################################################################################################
=== 5/5 Writing ARPA model ===
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100


Name:lmplz VmPeak:4976972 kB VmRSS:14504 kB RSSMax:1137172 kB user:0.567799 sys:0.508239 CPU:1.07618 real:1.22807`

then the .binary file

/home/manuel/Descargas/version2/DeepSpeech-0.2.0/native_client/kenlm/build/bin/build_binary -T -s words.arpa lm.binary

and I get this:

`Reading words.arpa
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100


SUCCESS`

and finally try to generate the trie file

/home/manuel/Descargas/version2/DeepSpeech-0.2.0/native_client/generate_trie alphabet.txt lm.binary vocabulary.txt trie

and this appears:

terminate called after throwing an instance of 'lm::FormatLoadException' what(): native_client/kenlm/lm/binary_format.cc:131 in void lm::ngram::MatchCheck(lm::ngram::ModelType, unsigned int, const lm::ngram::Parameters&) threw FormatLoadException. The binary file was built for probing hash tables but the inference code is trying to load trie with quantization and array-compressed pointers Abortado (core’ generado)`

You should not do that, just build KenLM somewhere else. The native_client/kenlm/ we have is just a stripped down version of what we need.

And I insist, please read the documentation data/lm/README.md

ok, I will not do it anymore

I read it, but I need to create my own model in Spanish, that’s why I followed this tutorial

I do not know what I’m doing wrong

You are not reading properly :). This tutorial by @elpimous_robot is good, except it’s old and the trie format changed. Please pay attention to the build_binary call in the documentation, and carefully compare to what you do, it’s under your nose :slight_smile:

Is this python code made to run in conda?

The code in data/lm/README.md ? I think it’s for a notebook, but you don’t really need that …