DeepSpeech Language Model parameters

Hello,

after I did some first experiments with deepspeech now I want to optimize the recognition with an own language model and an own scorer. Can you help me, what the parameters to use “generate_lm.py” mean and what I can change with them?
I have a few ideas, but I didn´t find an explanation for all parameters. It would be great if some of you could help me.

Here is the code and partly my ideas to the parameters:

–input_txt phrases.txt:
All words from „phrases.txt“ are valid words to recognize.

–output_dir .
directory for saving the LM

–top_k 500000

–kenlm_bins …/…/…/kenlm/build/bin
maybe just the path to the “bin”-folder from KenLM?

–arpa_order 5

–max_arpa_memory “85%”

–arpa_prune “0|0|1”

–binary_a_bits 255

–binary_q_bits 8

–binary_type trie

–discount_fallback

Thanks in advance!

1 Like

Hello @Anfaenger
Did you look here yet?
https://deepspeech.readthedocs.io/en/latest/Scorer.html

Unless you dig deeper into the code and workings of KenLM then my quick answer would be to keep them largely as they are.
Depending on how much data you have in your phrases input you might want to adjust top_k (or maybe turn that part off, as you indicate all phases are valid)

1 Like

Hello @nmstoker,

thank you very much for your answer.

Yes, I followed this instructions from the side to generate a language model and a scorer, but I didn´t find an explanation for the parameters.

Okay, thanks for that tip.
I have a special enviroment where I want to use the speech recognition. For example I just have maybe 100 words (and 1000 phrases combinations from the words), which should be recognized by the program. So I was wondering, what the other parameters can effect, maybe I can reach a better recognition then.

Does that mean, that the parameter top_k means, how many valid phrases are in the input txt-file? Maybe the question is very stupid but why should I put phrases in the txt-file, which are not vaild?

Thanks for your help.

Have you read the help message of the script as well? It contains a few hints already.

top_k is here to help filter the dataset, and remove less frequents ones. It’s useful when you have huge data for a generic language model, but in your case, you can set that to a higher value than your vocabulary file to filter nothing.

it could help your model with differentiation when your speaker says something that is not in the set of commansd you want to recognize

except the order, I think in your case it’s mostly going to depend on the quality of your vocab.txt

Hello @lissyx,
thanks for your hints and explanations.

Yes, I did. But I didn´t find a real explanation for some things. For example the binary quantization value a and q, the arpa-file oder the arpa pruning parameters exactly are.

Okay, thanks for your estimation.