Building LM, noticed vocab.txt and librispeech-lm-norm.txt have a lot of low quality words

kdavis · November 13, 2018, 12:26pm

I’m currently experimenting with new language models with a limited vocabulary, the 10k, 20k, 30k, 40k, or 50k most common words from librispeech-lm-norm.txt.

Using this limited vocabulary should throw out rare words in librispeech-lm-norm.txt that appear only once or twice and thus address this problem. But we have to run the benchmarks to be sure.

Topic		Replies	Views
Fine tune the Language Model DeepSpeech	3	494	December 6, 2019
Where is Vocab.txt file? DeepSpeech	10	2441	April 5, 2019
Fine tuning the language model DeepSpeech	3	1631	October 11, 2018
How can i add custom vocab.txt and build a language model lm.binary, trie for pretrained model v0.2.0 DeepSpeech	17	5782	April 11, 2019
Creation of language model and trie DeepSpeech	28	12804	August 7, 2019

Building LM, noticed vocab.txt and librispeech-lm-norm.txt have a lot of low quality words

Related topics