Hi @lissyx,
Below are the steps i used for generating trie and lm.binary files
Generating language model –
- Clone deep speech git repo branch 0.5.1 with git lfs
- git clone --branch v0.5.1 https://github.com/mozilla/DeepSpeech.git
- install the dependencies
- pip3 install -r requirement.txt
- install CTC decoder
- pip3 install $(python3 util/taskcluster.py --decoder)
- clone tensor flow in same directory as DeepSpeech
- in tensor flow directory run
- git checkout origin/r1.13
- As tensor flow version is 1.13 , so Bazel build tool version would be Bazel 0.19.2
- I am using linux 16.04 , so based on https://docs.bazel.build/versions/master/install-ubuntu.html , Executed below commands
- sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python3
- Downloaded Bazel version - bazel-0.19.2installer-linux-x86_64.sh
- chmod +x bazel-0.19.2installer-linux-x86_64.sh
- ./ bazel-0.19.2installer-linux-x86_64.sh –user
- Used all the default/recommended options
- export PATH="$PATH:$HOME/bin"
- navigated to tensorflow directory and executed
- ./configure
- Used all the default/recommended options
- ln -s …/DeepSpeech/native_client ./
- bazel build --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_trie
- ./configure
- So far, I was able to compile DeepSpeech and I could see binaries in /tensorflow/bazel-bin/native-client directory.
- Navigated to DeepSpeech/native-client ,
- Clone kenlm repo - git clone --depth 1 https://github.com/kpu/kenlm
- In kenlm directory, create a build folder
- Navigate to build folder and execute
- cmake …
- make -j 4
- After above step I could see lmplz and build_binary in /DeepSpeech/native_client/kenlm/build/bin directory
- Form /DeepSpeech/native_client/kenlm/build/bin directory , executed
- ./lmplz --order 5 --memory 50% --text /home/laxmikantm/proto_1/vocabulary.txt --arpa /tmp/lm.apra --prune 0 0 0 1 --temp_prefix /tmp/
- ./build_binary -a 255 -q 8 trie /tmp/lm.apra /tmp/lm.binary
- From /tensorflow/bazel-bin/native_client directory
- ./generate_trie /home/xxxxxxx/proto_1/vocabulary.txt /tmp/lm.binary /tmp/trie
- After above step I had trie and lm.binary in tmp folder , I copied these files to a new folder and then used from there.
For testing purpose i have only 10 files , and generated file size is -
trie - 75 Byte
lm.binary - 9.4K