Letter based language model

mpenagar · December 4, 2018, 10:54am

Hi,

Is it possible to use a letter based language model (i.e. letter 5-grams for example) with deepspeech?

I tried to train a letter based language model with KenLM toolkit, but did not suceed. Then I trained it with sri-lm (worked), but when I try to create the trie it fails, both if I create first the binary LM or not:

root@b9ba16c8d6a1:/DeepSpeech# /DeepSpeech/native_client/kenlm/build/bin/build_binary lm.arpa lm.binary
Reading lm.arpa
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
The ARPA file is missing . Substituting log10 probability -100.

SUCCESS

/DeepSpeech/native_client/generate_trie alphabet.txt lm.binary trie

Segmentation fault (core dumped)

/DeepSpeech/native_client/generate_trie alphabet.txt lm.arpa trie
Loading the LM will be faster if you build a binary file.
Reading lm.arpa
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
The ARPA file is missing . Substituting log10 probability -100.

Segmentation fault (core dumped)

mathematiguy · December 19, 2018, 1:08am

Can we bump this question up the list?

I’m working on incorporating some orthographical character rules to the language model (e.g. no consecutive consonant clusters) which are specific to my language use-case, and my understanding is that a character based language model might make it possible to restrict the language model in those specific ways.

Or if there’s another way to achieve this goal, I could consider that as well…

Topic		Replies	Views
How language model is used in deepspeech DeepSpeech	5	8325	February 26, 2018
Creation of language model and trie DeepSpeech	28	12813	August 7, 2019
Issue with Language Model DeepSpeech	11	1054	January 3, 2019
KenLM LM vs trie DeepSpeech	7	2983	April 13, 2019
Fine tune the Language Model DeepSpeech	3	499	December 6, 2019

Letter based language model

/DeepSpeech/native_client/generate_trie alphabet.txt lm.binary trie

Related topics