Hi,
I’m trying to build my own lm buy following instructions in the link for deepspeech 0.9.3:
The environment I’m using is as described here:
https://mozilla.github.io/deepspeech-playbook/ENVIRONMENT.html
The docker image runs fine. The problem is when I try to generate the lm.binary and vocab-500000.txt files.
Running the following command causes a segmentation fault.
python3 generate_lm.py \
--input_txt /<Location_to_my_sentences> \
--output_dir /DeepSpeech/deepspeech-data/ \
--top_k 500000 --kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/ \
--arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" \
--binary_a_bits 255 --binary_q_bits 8 --binary_type trie
I have tried compiling a new binary for kenlm on the container, but it results in the same error. Another solution I found was to upgrade the boost version to 1.67, again this did not fix the issue.
Has anyone tried the docker image and ran into the same problem?