Deepspeech docker image language model lmplz segmentation fault

b2341 · April 22, 2021, 4:00am

Hi,

I’m trying to build my own lm buy following instructions in the link for deepspeech 0.9.3:

https://mozilla.github.io/deepspeech-playbook/SCORER.html#using–lm-optimizerpy–to-generate-values-for-the-parameters----default-alpha–and----default-beta–that-are-used-by-the–generate-scorer-package–script

The environment I’m using is as described here:
https://mozilla.github.io/deepspeech-playbook/ENVIRONMENT.html

The docker image runs fine. The problem is when I try to generate the lm.binary and vocab-500000.txt files.

Running the following command causes a segmentation fault.

python3 generate_lm.py \
 --input_txt /<Location_to_my_sentences> \
 --output_dir /DeepSpeech/deepspeech-data/ \
 --top_k 500000 --kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/ \
 --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" \
 --binary_a_bits 255 --binary_q_bits 8 --binary_type trie

I have tried compiling a new binary for kenlm on the container, but it results in the same error. Another solution I found was to upgrade the boost version to 1.67, again this did not fix the issue.

Has anyone tried the docker image and ran into the same problem?

b2341 · April 23, 2021, 5:49am

One extra bit of information…I’m running the docker under ubuntu WSL2 on windows.

I did some experiments and kenlm seem to work fine under a ubuntu VM, just not in WSL2.

Does anyone know why?

The memory allocate for the docker container is 8GB. I’ve also tired a small test app that malloc 1GB of memory and fills it in with random data. This works fine in the docker image under WLS2. kenlm by default seems to be using about 1GB too but it does not work in WSL2.

Topic		Replies	Views
[SOLVED] Unable to optimize language model (Segmentation fault, core dump ) DeepSpeech	4	858	November 16, 2020
Using the newly generated language model doesn't perform as expected DeepSpeech	2	478	June 26, 2021
Debugging/validating custom lm.scorer DeepSpeech	8	707	June 5, 2020
Segmentation fault when TESTING DeepSpeech	2	659	June 25, 2020
Issue while generating scorer for Urdu language DeepSpeech participation , learning , issue	0	458	October 5, 2021

Deepspeech docker image language model lmplz segmentation fault

Related topics