I was wondering whether the original ARPA version of lm.binary (deepspeech-0.4.1-models.tar.gz) is available for download.
I’m looking into integrating a task-specific language model using SRILM but that requires the original ARPA version of the model. I understand from here that the text file from which the model was generated is not available due to licensing issues - but does that preclude the textual ARPA model from being published as well? Although there seemed to be some talk about using a publicly available training data set - was LibriSpeech eventually used for training the language model?
I tried converting the binary into the ARPA version but it doesn’t seem like kenlm supports it. I’ve tried the instructions here but I can no(t) (longer) find “vocab.txt” in the repo … (?)
I’ve tried some other utilities (e.g., sphinx_lm_convert) but they didn’t recognize the binary format (both for the trie and lm.binary).