Availability of lm.binary ARPA version

I was wondering whether the original ARPA version of lm.binary (deepspeech-0.4.1-models.tar.gz) is available for download.

I’m looking into integrating a task-specific language model using SRILM but that requires the original ARPA version of the model. I understand from here that the text file from which the model was generated is not available due to licensing issues - but does that preclude the textual ARPA model from being published as well? Although there seemed to be some talk about using a publicly available training data set - was LibriSpeech eventually used for training the language model?

I tried converting the binary into the ARPA version but it doesn’t seem like kenlm supports it. I’ve tried the instructions here but I can no(t) (longer) find “vocab.txt” in the repo … (?)

I’ve tried some other utilities (e.g., sphinx_lm_convert) but they didn’t recognize the binary format (both for the trie and lm.binary).

This is no longer the case.

You can re-create the ARPA using the instructions here.

I have the same problem; i am looking for .arpa file of deepspeech language model. The above link for the instruction does not work any longer. could you please update the link to the instruction?

1 Like

thank you very much for your quick reply. really appreciate it.