Letter based language model


(Mikel Penagarikano) #1

Hi,

Is it possible to use a letter based language model (i.e. letter 5-grams for example) with deepspeech?

I tried to train a letter based language model with KenLM toolkit, but did not suceed. Then I trained it with sri-lm (worked), but when I try to create the trie it fails, both if I create first the binary LM or not:

root@b9ba16c8d6a1:/DeepSpeech# /DeepSpeech/native_client/kenlm/build/bin/build_binary lm.arpa lm.binary
Reading lm.arpa
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
The ARPA file is missing . Substituting log10 probability -100.


SUCCESS

/DeepSpeech/native_client/generate_trie alphabet.txt lm.binary trie

Segmentation fault (core dumped)

/DeepSpeech/native_client/generate_trie alphabet.txt lm.arpa trie
Loading the LM will be faster if you build a binary file.
Reading lm.arpa
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
The ARPA file is missing . Substituting log10 probability -100.


Segmentation fault (core dumped)