Confirming data used for training 0.5.1 LM

Hopefully this is an easy / quick to answer question :slightly_smiling_face:

Regarding the language model included with the 0.5.1 release could someone from the team confirm that it was trained with the data / process here: https://github.com/mozilla/DeepSpeech/tree/master/data/lm ?

I just wanted to be sure as Iā€™m looking at extending the LM with some particular text for my application (eg names not present in LibriSpeech) and wanted to know I was starting from the correct base.

@kdavis There was also talk of using only the top 10k - 50k words here - has that been implemented yet or is it still a work in progress? Seemed like it had potential.

The LM was indeed trained as in described here https://github.com/mozilla/DeepSpeech/tree/master/data/lm

Great. Thanks for confirming.