I have a clone (technically git subtree) of DeepSpeech repo in
my_project_dir/vendor/DeepSpeech, I’ve cloned the repo but not pip installed deep speech because I want to retrain a model (see https://github.com/mozilla/DeepSpeech/issues/2219)
It’s unclear to me from the readme and code what the recommended way to run inference on my test-set files is, but the latest thing I tried (that I was most confident in) was running the command below. (As you can see in the command below, I’m running the DS pretrained model directly, not my own retrained checkpoint (though I have produced a few successfully), trying to narrow the scope of possible causes of error):
python evaluate.py --model deepspeech-0.5.1-models/output_graph.pbmm --alphabet deepspeech-0.5.1-models/alphabet.txt --lm deepspeech-0.5.1-models/lm.binary --trie deepspeech-0.5.1-models/trie --test_files /home/mepstein/voice_to_text/data/ds_csvs/test.csv
^ from my DS venv from path ~/my_project_dir/vendor/DeepSpeech.
The error I’m getting is:
Loading the LM will be faster if you build a binary file. Reading data/lm/lm.binary ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 terminate called after throwing an instance of 'lm::FormatLoadException' what(): ../kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&) threw FormatLoadException. first non-empty line was "version https://git-lfs.github.com/spec/v1" not \data\. Byte: 43 Aborted (core dumped)
I think I’ve seen advice that this message would be due to not having
git-lfs installed, but I do.
Also, the log note that it’s looking for the binary file in
data/lm/lm.binary was interesting to me, since (based on the readme) I always pass the lm.binary file in
So then I checked if the two
lm.binary files are identical:
(voice_to_text) mepstein@pop-os:~/voice_to_text/vendor/DeepSpeech$ diff deepspeech-0.5.1-models/lm.binary data/lm/lm.binary
and got output:
Binary files deepspeech-0.5.1-models/lm.binary and data/lm/lm.binary differ
So two questions - 1) what is the recommended way to run speech-to-text inference when using vendor’d/cloned DeepSpeech repo instead of pip install, and 2) which of those two (different) lm.binary files should I be using? They both come from DeepSpeech, I did not produce either of them.