Hello,
I am trying to develop a STT app that transcribes simple phrases (identifier direction number, ex: alpha up 2).
To do so I created a custom LM model (I followed TUTORIAL : How I trained a specific french model to control my robot) using kenlm. I managed to create an lm.binary and a trie file. Moreover, the words.arpa file generated by kenlm seems to make sense (it follows a similar syntax as https://github.com/kpu/kenlm/blob/master/lm/common/test_data/toy0.arpa).
Unfortunately, when I try to transribe a wav file using the CLI
deepspeech --model deepspeech/deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech/custom_lm/lm.binary --trie deepspeech/custom_lm/trie --audio audio/alpha_up_2.wav
, I get a single character letter (“a”) as a prediction instead of “alpha up two”.
Doing the same without and LM gives “o fort tooed to” and with the default LM I get “i felt too to”. Which, altough incorrect, sounds plausible interpretations of “alpha up two”.
I am using version 0.6.1, my arpa file looks like this:
\data
ngram 1=26
ngram 2=99
ngram 3=174
\1-grams:
-1.7331841 0
0 -1.90309 0
-1.1942315
-1.1942315 a -0.34242266
-1.6775719 l -0.30103
-1.6775719 p -0.30103
-1.6282793 h -0.30103
and my vocabulary.txt like
a l p h a u p o n e
a l p h a u p t w o
a l p h a u p t h r e e
a l p h a u p f o u r
a l p h a u p f i v e
Initially I did not put spaces between each letter and I got a single word as a prediction. I’m not sure putting spaces between each letter in vocabulary.txt is a good idea, I tried it to solve the single word prediction problem.
Could you please help with the “single letter prediction problem” and confirm/infirm wether I should put spaces in vocabulary.txt?
Thank you very much for your help,
Nathan