ValueError: Scorer initialization failed with error code 8198 swig/python detected a memory leak of type 'Alphabet *', no destructor found

Hello. I am currently getting this error that I do not understand. I am pointing to the kenlm.scorer given after git cloning with git-lfs. I am not sure about this error. Any assistance would be very helpful. Thank you.

Below is a picture of my .sh file. I am running this on my CPU and have followed all training steps in the Docs.

#!/bin/sh
set -xe
if [ ! -f DeepSpeech.py ]; then
echo “Please make sure you run this from DeepSpeech’s top level directory.”
exit 1
fi;

if [ -d “${COMPUTE_KEEP_DIR}” ]; then
checkpoint_dir=$COMPUTE_KEEP_DIR
else
checkpoint_dir=$(python -c ‘from xdg import BaseDirectory as xdg; print(xdg.save_data_path(“deepspeech/googlec2”))’)
fi

export CUDA_VISIBLE_DEVICES=0
export TF_FORCE_GPU_ALLOW_GROWTH=true

python -u DeepSpeech.py --train_files //tmp/external/google_cmds_processed/train.csv
–test_files //tmp/external/google_cmds_processed/test.csv
–dev_files //tmp/external/google_cmds_processed/dev.csv
–scorer_path //DeepSpeech/data/lm/kenlm.scorer
–alphabet_config_path //DeepSpeech/data/alphabet.txt
–export_dir /tmp/external/DeepSpeech_models/googlecommands
–checkpoint_dir “$checkpoint_dir”
“$@”

here is what the error looks like:

python -u DeepSpeech.py --train_files //tmp/external/google_cmds_processed/train.csv --test_files //tmp/external/google_cmds_processed/test.csv --dev_files //tmp/external/google_cmds_processed/dev.csv --scorer_path //DeepSpeech/data/lm/kenlm.scorer --alphabet_config_path //DeepSpeech/data/alphabet.txt
Traceback (most recent call last):
File “DeepSpeech.py”, line 12, in
ds_train.run_script()
File “/DeepSpeech/training/deepspeech_training/train.py”, line 941, in run_script
absl.app.run(main)
File “/root/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/root/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “/DeepSpeech/training/deepspeech_training/train.py”, line 908, in main
early_training_checks()
File “/DeepSpeech/training/deepspeech_training/train.py”, line 893, in early_training_checks
FLAGS.scorer_path, Config.alphabet)
File “/root/deepspeech-train-venv/lib/python3.6/site-packages/ds_ctcdecoder/init.py”, line 42, in init
raise ValueError(‘Scorer initialization failed with error code {}’.format(err))
ValueError: Scorer initialization failed with error code 8198
swig/python detected a memory leak of type ‘Alphabet *’, no destructor found.

UPDATE: So instead of pointing at the base kenlm.scorer provided in the git repo, I decided to grab a deepspeech-0.7.1-models.scorer and point to that instead.

I am still getting the memory leak detection but no longer getting the error with Scorer initialization. Training Started and is running now as well.

I have the same error and it seems like its to do with git-lfs. My kenlm.scorer file is tiny and when I inspect the contents it looks like its a git-lfs reference. I’m unfamiliar with git-lfs so I just downloaded the scorer direct like you.

Interesting! I am using Deepspeech in a Docker container like below:

FROM tensorflow/tensorflow:1.15.2-py3

RUN apt-get update && apt-get install -y \

apt-utils \

vim \

git \

git-lfs

RUN git clone https: //github.com/mozilla/DeepSpeech

WORKDIR //DeepSpeech

RUN pip3 install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3

RUN pip3 install --upgrade --no-cache-dir -e .

I can get training to run but I have to initialize the direct deepspeech-0.7.x-models.scorer file like you mentioned from the Deepspeech repo. The one after pulling with git lfs installed first (as the docs say to do) doesn’t seem to be the proper kenlm.scorer. Also not too keen on the memory leak error either.

Please look through Github, this is taken care of.

Again, this is not our experience, are you sure the git-lfs setup is okay ?

I would always set the branch for the git clone to 0.7.3 or whatever you are downloading the release for as there is quite a lot of development going on in the master branch that can break using older releases.

git clone --branch v0.7.3 https://github.com/mozilla/DeepSpeech

git-lfs seems to be fine. This is what I get running it in terminal.

git-lfs
git-lfs/2.3.4 (GitHub; linux amd64; go 1.8.3)

I made this docker container about 8 days ago as well as this post but just noted the memory leak error fix on Github. Thanks for that.

As for the Git-lfs issue, maybe it has something to do with the way I have it structured in the Dockerfile I posted? I am installing it in the first batch of commands then I run the git clone in it’s own RUN command. I’m a bit new to Docker but not sure why this would be an issue unless I need to run git clone in the first batch of commands instead of a separate RUN statement.

Good to know. Thank you.

And what is the content of kenlm.scorer ?

Show us the code instead of describing. It’s working for other people …

The Dockerfile? I posted it above mate. Nonetheless, here it is:

FROM tensorflow/tensorflow:1.15.2-py3

RUN apt-get update && apt-get install -y \

apt-utils \

vim \

git \

git-lfs

RUN git clone https: //github.com/mozilla/DeepSpeech

WORKDIR //DeepSpeech

RUN pip3 install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3

RUN pip3 install --upgrade --no-cache-dir -e .

The kenlm.scorer is about 4.0kb so yeah definitely not the right one

Right, it’s not formatted as one set of code so that’s why I missed it.

That’s weird. How about an extra RUN git lfs install ?

Looks like I repro with that image: https://community-tc.services.mozilla.com/tasks/DyEzlVhXSPGXhdM3XSPXlQ/runs/0/logs/https%3A%2F%2Fcommunity-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FDyEzlVhXSPGXhdM3XSPXlQ%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Flive.log#L15254

Step 8 : RUN ls -hal data/lm/
 ---> Running in 1bbe16b9a7de
total 28K
drwxr-xr-x 2 root root 4.0K Jun 12 12:34 .
drwxr-xr-x 7 root root 4.0K Jun 12 12:34 ..
-rw-r--r-- 1 root root 6.4K Jun 12 12:34 generate_lm.py
-rw-r--r-- 1 root root 4.8K Jun 12 12:34 generate_package.py
-rw-r--r-- 1 root root  134 Jun 12 12:34 kenlm.scorer

Interesting. I wonder if I should just start with a blank Ubuntu image and go along with the general python3 installs. I just had the Tensorflow image as an old attempt at this, its obviously not necessary. I still wonder why that specific image would be affecting the git-lfs pull of the proper files.

Well I’m in the middle of working on that topic, so I’ll keep you posted, because I don’t see any good reason it would not work.

@Epoetin on my PR, adding RUN git lfs install before the git clone and I see a 910MB kenlm.scorer file :slight_smile:

Well, that solves it then! I suppose apt-get git-lfs isnt as robust as git lfs install. Appreciate the quick responses and feedback! Would you mind sharing the entire Dockerfile you used? I should probably set up my Dockerfile for venv with Deepspeech.

1 Like

Appreciate it! I will try and run this on my machine sometime soon and see the results.

It’s not yet ready, so … use with caution.

I am not using your entire Dockerfile but simply trying to setup DeepSpeech with a venv in Docker as the docs recommend. This may be more of a Docker issue but for whatever reason with this dockerfile below I get an error. Everything runs fine until the final step with the final RUN statement, and I am getting the proper kenlm.scorer now thanks to your assistance earlier. Yet when I add the virtual environment clause, similar to how you have yours, I get a setuptools error. Any idea?

FROM tensorflow/tensorflow:1.15.2-py3

RUN apt-get update && apt-get install -y
apt-utils
vim
bash-completion
build-essential
curl
git
git-lfs
unzip
wget
python3-venv

WORKDIR /
RUN git lfs install
RUN git clone https://github.com/mozilla/DeepSpeech

WORKDIR /DeepSpeech
RUN python3 -m venv venv/
ENV VIRTUAL_ENV $(pwd)/venv
ENV PATH $VIRTUAL_ENV/bin:$PATH
RUN which python

RUN pip3 install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3
RUN pip3 install --upgrade --force-reinstall -e .

Error here:

Collecting alembic
Downloading alembic-1.4.2.tar.gz (1.1 MB)
Installing build dependencies: started
Installing build dependencies: finished with status ‘done’
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status ‘error’
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 /usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpvloxyboz
cwd: /tmp/pip-install-nbmhi4w0/alembic
Complete output (10 lines):
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/_in_process.py”, line 257, in
main()
File “/usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/_in_process.py”, line 240, in main
json_out[‘return_val’] = hook(**hook_input[‘kwargs’])
File “/usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/_in_process.py”, line 85, in get_requires_for_build_wheel
backend = _build_backend()
File “/usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/_in_process.py”, line 76, in _build_backend
obj = getattr(obj, path_part)
AttributeError: module ‘setuptools.build_meta’ has no attribute ‘legacy

ERROR: Command errored out with exit status 1: /usr/bin/python3 /usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpvloxyboz Check the logs for full command output.
WARNING: You are using pip version 20.0.2; however, version 20.1.1 is available.
You should consider upgrading via the ‘/usr/bin/python3 -m pip install --upgrade pip’ command.
ERROR: Service ‘deepspeech’ failed to build: The command ‘/bin/sh -c pip3 install --upgrade --force-reinstall -e .’ returned a non-zero code: 1