Usage instructions for lm_optimizer

I am reading the Scorer page, and built my own scorer! However, it does not recognize anything :frowning: So, I decided to tune my alpha and beta values, using the lm_optimizer script. But I cannot figure out how to use it. There are various FLAGS being used in the script however it’s not clear how to run that.

I tried this for example:

python ../../lm_optimizer.py --test_files=vocab-5000.txt --alphabet_config_path=../alphabet.txt --scorer=kenlm.scorer

(here kenlm.scorer and vocab-5000.txt are generated by me)

But it exits with error code 1 and message:

/home/gt/otherrepos/DeepSpeech/venv/lib/python3.7/site-packages/pandas/compat/__init__.py:117: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
  warnings.warn(msg)
swig/python detected a memory leak of type 'Alphabet *', no destructor found.
swig/python detected a memory leak of type 'Alphabet *', no destructor found.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
E All initialization methods failed (['best', 'last']).

Which is because I’m pretty sure I got the arguments wrong. Could anyone maybe show how a sample command is run?

The arguments are mostly the same as evaluate.py, it expects a validation dataset CSV file for --test_files (not a vocabulary file) and also a valid --checkpoint_dir pointing to a trained model checkpoint.

Alright, thanks @reuben! I had not seen evaluate.py earlier, hence, the confusion. I will look into it now.