Out of vocabulary rejection

Hello all,

I would like to kindly ask you to help me to clarify how DeepSpeech handles out-of-vocabulary words. I’m trying to build a simple command-and-control application (46 commands, each being English sentence with 1-6 words, 60 unique words in total). I’m using Python and DeepSpeech v0.7.4 (both the PyPi package and the full sources from github), which is also the tag of all the files I’m refering to in my experiments below.

First, I would like to double check that what I want is possible to do with DeepSpeech: With the default acoustic model and my own language model (i.e. a scorer built according to https://deepspeech.readthedocs.io/en/v0.7.4/Scorer.html#building-your-own-scorer), can DeepSpeech reject out-of-vocabulary words? If so, how is it indicated during the inference? Is there <unk> in the output transcription, is there an error/exception thrown, or… ?

If the answer to the previous is “yes”, can somebody point me in the right direction to achieve that?

Here’s what I tried so far:
I started with generate_lm.py and generate_package.py as they can be found in data/lm. I only modified the generate_lm.py slightly, so:
- the temporary ARPA files are not deleted so I can view them
- the --interpolate_unigrams for KenLM’s lmplz was set to 0, which shall lead to p() being higher than p()'s of other words

I generated the language model with:
python3 data/lm/generate_lm.py --input_txt my_corpus.txt --output_dir ./generated --top_k $word_count --kenlm_bins $bin_path --arpa_order 3 --max_arpa_memory 80% --arpa_prune 0 --binary_a_bits 22 --binary_q_bits 8 --binary_type trie --discount_fallback
($word_count = 60, so none of my words is filtered, binary_q_bits and binary_a_bits are the defaults for lmplz)

The generated ARPA file has only one entry that contains and that is the following unigram: -0.37345523 <unk> 0
The my_corpus.txt and the resulting ARPA (kenlm_generated.arpa) files are attached.

After that I generated the package with:
python3 data/lm/generate_package.py --alphabet data/alphabet.txt --lm generated/lm.binary --vocab generated/vocab-60.txt --package generated/my_scorer.scorer --default_alpha 1.0 --default_beta 1.0
(Please notice the values of default_alpha and default_beta were just first random picks - may these default values have any impact on the out of vocabulary words? I’m planing to further optimize them with lm_optimizer.py as mentioned here: https://deepspeech.readthedocs.io/en/v0.7.4/Scorer.html#building-your-own-scorer)

After that I started the inference with:
python3 native_client/python/client.py --model models/deepspeech-0.7.4-models.tflite --audio ../audio_records/arctic_a0005.wav --scorer generated/my_scorer.scorer

The resulting inference (tested with different audio files) was a combination of unigrams from the generated ARPA file (i.e. inference with backoffs, no higher order N-grams occurring in the transcription).

What was weird to me was that is in the ARPA file only once as an unigram and without a backoff weight specified. This got me into thinking that maybe I could get the ARPA file to the form where also appears in higher order N-grams and has backoff weights specified, as it can be seen for example here (https://cmusphinx.github.io/wiki/arpaformat/). And that maybe this will improve my situation…

I did study the options for KenLM binaries, especially the lmplz, but I haven’t found much info about how to alter the occurrence in ARPA files (except the --interpolate_unigrams mentioned above). Therefore I decided to experiment by generating an ARPA file with different means (http://www.speech.cs.cmu.edu/tools/lmtool-new.html) and feed it with corpus that, in addition to my commands, contains 100 sentences of different lenghts only composed only from s (s in the corpus file aren’t accepted by KenLM lmplz). In the resulting ARPA file, as an unigram had backoff defined and it also was occurring in different combinations in the higher order N-grams, which was what I wanted to see. I fed this ARPA file to lm.binary in generate_lm.py (i.e. bypassing the calls to lmplz and filter in the script), generated the package and tested the inference again. The result of inference may have contained different words than before, but again, it was just a combination of different unigrams from my vocabulary.

To push it even further, I decided to push the weights way too high by ‘brute force’, weights and backoffs for all the lines in the ARPA file (i.e. the log values) were divided by 256 (see lot_of_unks.arpa). Generated the package again, tested inference, no improvement of the situation…

I do realize that my experiments were not particularly clever, but I wanted to see what would be the impact of making such a language model where unknown words have so high weights when compared to the in-vocabulary words. Could somebody tell me what are the expected results of inference with DeepSpeech with such a configuration?

Thank you all in advance
Jindra

Content of files (why can’t we upload simple text files as attachment to the topic?)
my_corpus.txt:

hello machine
start
stop
reset
start conveyor belt
stop conveyor belt
decrease the conveyor belt speed
increase the conveyor belt speed
set speed to ten percent
set speed to twenty percent
set speed to fifty percent
set speed to one hundred percent
get the conveyor belt speed
switch to maintenance mode
switch to fail safe mode
switch to production mode
start calibration
get diagnostic information
get failure log
clear failure log
move actuator up
move actuator down
get actuator temperature
get switch state
switch screen to main menu
switch screen to menu one
switch screen to menu two
switch screen to menu three
switch screen to maintenance menu
menu up
undo
redo
cancel
accept
load preset one
load preset two
load preset three
reset to default settings
get production rate
get quality information
get failure rate
get running time
operator log in
operator log off
enable
disable

kenlm_generated.arpa:

\data\
ngram 1=63
ngram 2=118
ngram 3=116
\1-grams:

-2.1658468	<unk>	0
0	<s>	-0.32572055
-0.598042	</s>	0
-2.011489	hello	-0.1040886
-2.011489	machine	-0.1040886
-2.011489	start	-0.1040886
-2.011489	stop	-0.1040886
-2.011489	reset	-0.1040886
-1.6287453	conveyor	-0.23632658
-2.011489	belt	-0.1040886
-2.011489	decrease	-0.1040886
-1.6287453	the	-0.23632658
-1.8896521	speed	-0.1040886
-2.011489	increase	-0.1040886
-2.011489	set	-0.1040886
-1.4950578	to	-0.12895685
-2.011489	ten	-0.1040886
-1.4950578	percent	-0.3612653
-2.011489	twenty	-0.1040886
-2.011489	fifty	-0.1040886
-1.6287453	one	-0.20320818
-2.011489	hundred	-0.1040886
-2.011489	get	-0.1040886
-1.8896521	switch	-0.1040886
-2.011489	maintenance	-0.1040886
-1.6287453	mode	-0.23632658
-2.011489	fail	-0.1040886
-2.011489	safe	-0.1040886
-1.8896521	production	-0.1040886
-2.011489	calibration	-0.1040886
-2.011489	diagnostic	-0.1040886
-1.8896521	information	-0.26278532
-1.8896521	failure	-0.20320818
-1.8896521	log	-0.1040886
-2.011489	clear	-0.1040886
-2.011489	move	-0.1040886
-1.8896521	actuator	-0.1040886
-1.8896521	up	-0.26278532
-2.011489	down	-0.1040886
-2.011489	temperature	-0.1040886
-2.011489	state	-0.1040886
-2.011489	screen	-0.1040886
-2.011489	main	-0.1040886
-1.4950578	menu	-0.15082674
-1.8896521	two	-0.26278532
-1.8896521	three	-0.26278532
-2.011489	undo	-0.1040886
-2.011489	redo	-0.1040886
-2.011489	cancel	-0.1040886
-2.011489	accept	-0.1040886
-2.011489	load	-0.1040886
-2.011489	preset	-0.1040886
-2.011489	default	-0.1040886
-2.011489	settings	-0.1040886
-1.8896521	rate	-0.26278532
-2.011489	quality	-0.1040886
-2.011489	running	-0.1040886
-2.011489	time	-0.1040886
-2.011489	operator	-0.1040886
-2.011489	in	-0.1040886
-2.011489	off	-0.1040886
-2.011489	enable	-0.1040886
-2.011489	disable	-0.1040886

\2-grams:
-0.38545653	machine </s>	0
-0.56929934	start </s>	0
-0.5155476	stop </s>	0
-0.5155476	reset </s>	0
-0.5155476	belt </s>	0
-0.5155476	speed </s>	0
-0.17096847	percent </s>	0
-0.33659884	one </s>	0
-0.24710484	mode </s>	0
-0.38545653	calibration </s>	0
-0.22786321	information </s>	0
-0.56929934	log </s>	0
-0.22786321	up </s>	0
-0.38545653	down </s>	0
-0.38545653	temperature </s>	0
-0.38545653	state </s>	0
-0.4819919	menu </s>	0
-0.22786321	two </s>	0
-0.22786321	three </s>	0
-0.38545653	undo </s>	0
-0.38545653	redo </s>	0
-0.38545653	cancel </s>	0
-0.38545653	accept </s>	0
-0.38545653	settings </s>	0
-0.22786321	rate </s>	0
-0.38545653	time </s>	0
-0.38545653	in </s>	0
-0.38545653	off </s>	0
-0.38545653	enable </s>	0
-0.38545653	disable </s>	0
-2.0346441	<s> hello	-0.30103
-0.6560439	hello machine	-0.30103
-1.4952537	<s> start	-0.30103
-1.6137103	<s> stop	-0.30103
-1.6137103	<s> reset	-0.30103
-1.0479926	start conveyor	-0.30103
-0.9028915	stop conveyor	-0.30103
-0.3631956	the conveyor	-0.30103
-0.3712802	conveyor belt	-0.30103
-2.0346441	<s> decrease	-0.30103
-0.63523424	decrease the	-0.30103
-0.63523424	increase the	-0.30103
-1.3454471	get the	-0.30103
-0.9329197	belt speed	-0.30103
-0.65118927	set speed	-0.42596874
-2.0346441	<s> increase	-0.30103
-1.2699497	<s> set	-0.42596874
-0.8803296	reset to	-0.30103
-0.8803296	speed to	-0.30103
-1.0167954	switch to	-0.30103
-0.6229069	screen to	-0.30103
-1.5749389	to ten	-0.30103
-0.6229069	ten percent	-0.30103
-0.6229069	twenty percent	-0.30103
-0.6229069	fifty percent	-0.30103
-0.6229069	hundred percent	-0.30103
-1.5749389	to twenty	-0.30103
-1.5749389	to fifty	-0.30103
-1.43363	to one	-0.30103
-1.282901	menu one	-0.30103
-1.0479926	preset one	-0.30103
-1.1127324	one hundred	-0.30103
-0.78940046	<s> get	-0.30103
-0.8472357	<s> switch	-0.42596874
-1.434335	get switch	-0.30103
-1.0468333	to maintenance	-0.30103
-0.9028915	maintenance mode	-0.30103
-0.63523424	safe mode	-0.30103
-0.9028915	production mode	-0.30103
-1.5749389	to fail	-0.30103
-0.6560439	fail safe	-0.30103
-1.538282	to production	-0.30103
-1.434335	get production	-0.30103
-1.1040161	start calibration	-0.30103
-1.4646709	get diagnostic	-0.30103
-0.65118927	diagnostic information	-0.30103
-0.65118927	quality information	-0.30103
-1.434335	get failure	-0.30103
-0.65118927	clear failure	-0.30103
-0.5076264	failure log	-0.30103
-0.65118927	operator log	-0.30103
-2.0346441	<s> clear	-0.30103
-1.6137103	<s> move	-0.30103
-1.434335	get actuator	-0.30103
-0.65118927	move actuator	-0.30103
-1.0905327	actuator up	-0.30103
-1.3503811	menu up	-0.30103
-1.1040161	actuator down	-0.30103
-1.1040161	actuator temperature	-0.30103
-1.1040161	switch state	-0.30103
-1.1040161	switch screen	-0.5228787
-1.5749389	to main	-0.30103
-1.7046212	<s> menu	-0.30103
-1.3651031	to menu	-0.30103
-0.8803296	maintenance menu	-0.30103
-0.6229069	main menu	-0.30103
-1.3503811	menu two	-0.30103
-1.0905327	preset two	-0.30103
-1.3503811	menu three	-0.30103
-1.0905327	preset three	-0.30103
-2.0346441	<s> undo	-0.30103
-2.0346441	<s> redo	-0.30103
-2.0346441	<s> cancel	-0.30103
-2.0346441	<s> accept	-0.30103
-1.4952537	<s> load	-0.30103
-0.6560439	load preset	-0.30103
-1.5749389	to default	-0.30103
-0.6560439	default settings	-0.30103
-0.9329197	production rate	-0.30103
-1.1017511	failure rate	-0.30103
-1.4646709	get quality	-0.30103
-1.4646709	get running	-0.30103
-0.6560439	running time	-0.30103
-1.6137103	<s> operator	-0.30103
-1.1040161	log in	-0.30103
-1.1040161	log off	-0.30103
-2.0346441	<s> enable	-0.30103
-2.0346441	<s> disable	-0.30103

\3-grams:
-0.15129851	hello machine </s>
-0.52076936	<s> start </s>
-0.3951763	<s> stop </s>
-0.3951763	<s> reset </s>
-0.45277482	conveyor belt </s>
-0.18538384	belt speed </s>
-0.07712488	ten percent </s>
-0.07712488	twenty percent </s>
-0.07712488	fifty percent </s>
-0.07712488	hundred percent </s>
-0.13647436	menu one </s>
-0.13647436	preset one </s>
-0.106209785	maintenance mode </s>
-0.106209785	safe mode </s>
-0.106209785	production mode </s>
-0.15129851	start calibration </s>
-0.09915569	diagnostic information </s>
-0.09915569	quality information </s>
-0.19736719	failure log </s>
-0.09915569	actuator up </s>
-0.09915569	menu up </s>
-0.15129851	actuator down </s>
-0.15129851	actuator temperature </s>
-0.15129851	switch state </s>
-0.17730382	maintenance menu </s>
-0.17730382	main menu </s>
-0.09915569	menu two </s>
-0.09915569	preset two </s>
-0.09915569	menu three </s>
-0.09915569	preset three </s>
-0.15129851	<s> undo </s>
-0.15129851	<s> redo </s>
-0.15129851	<s> cancel </s>
-0.15129851	<s> accept </s>
-0.15129851	default settings </s>
-0.09915569	production rate </s>
-0.09915569	failure rate </s>
-0.15129851	running time </s>
-0.15129851	log in </s>
-0.15129851	log off </s>
-0.15129851	<s> enable </s>
-0.15129851	<s> disable </s>
-0.21439326	<s> hello machine
-0.67482173	<s> start conveyor
-0.5051103	<s> stop conveyor
-0.14468811	decrease the conveyor
-0.14468811	increase the conveyor
-0.14468811	get the conveyor
-0.14711641	start conveyor belt
-0.14711641	stop conveyor belt
-0.14711641	the conveyor belt
-0.21055521	<s> decrease the
-0.21055521	<s> increase the
-1.1072094	<s> get the
-0.44569105	conveyor belt speed
-0.14952381	<s> set speed
-0.50050145	<s> reset to
-0.1710843	set speed to
-0.650572	<s> switch to
-0.11267256	switch screen to
-0.85916054	speed to ten
-0.20821008	to ten percent
-0.20821008	to twenty percent
-0.20821008	to fifty percent
-0.20821008	one hundred percent
-0.85916054	speed to twenty
-0.85916054	speed to fifty
-0.84338385	speed to one
-0.7150454	to menu one
-0.67482173	load preset one
-0.2687587	to one hundred
-1.1310747	<s> get switch
-0.674576	switch to maintenance
-0.8389656	screen to maintenance
-0.5051103	to maintenance mode
-0.21055521	fail safe mode
-0.24985543	to production mode
-0.7447946	switch to fail
-0.21439326	to fail safe
-0.7419761	switch to production
-1.1310747	<s> get production
-0.6860959	<s> start calibration
-1.1384242	<s> get diagnostic
-0.21351126	get diagnostic information
-0.21351126	get quality information
-0.8877189	<s> get failure
-0.21351126	<s> clear failure
-0.39215744	get failure log
-0.18351905	clear failure log
-0.21351126	<s> operator log
-1.1310747	<s> get actuator
-0.21351126	<s> move actuator
-0.53671676	move actuator up
-0.28206784	<s> menu up
-0.5385753	move actuator down
-0.26812866	get actuator temperature
-0.26812866	get switch state
-0.33067092	<s> switch screen
-0.945749	screen to main
-0.49272335	screen to menu
-0.50050145	to maintenance menu
-0.20821008	to main menu
-0.72358125	to menu two
-0.6834879	load preset two
-0.72358125	to menu three
-0.6834879	load preset three
-0.21439326	<s> load preset
-0.2896241	reset to default
-0.21439326	to default settings
-0.2530925	get production rate
-0.53826654	get failure rate
-1.1384242	<s> get quality
-1.1384242	<s> get running
-0.21439326	get running time
-0.5385753	operator log in
-0.5385753	operator log off

\end\

lot_of_unks.arpa:

\data\
ngram 1=63
ngram 2=121
ngram 3=120

\1-grams:
-1.1319	</s>	-0.3010
-1.1319	<s>	-0.1434
-0.002171484375	<unk>	-0.000440234375
-3.2962	accept	-0.2677
-2.8191	actuator	-0.3002
-2.5973	belt	-0.2661
-3.2962	calibration	-0.2677
-3.2962	cancel	-0.2677
-3.2962	clear	-0.3004
-2.5973	conveyor	-0.2999
-3.2962	decrease	-0.3004
-3.2962	default	-0.3008
-3.2962	diagnostic	-0.3006
-3.2962	disable	-0.2677
-3.2962	down	-0.2677
-3.2962	enable	-0.2677
-3.2962	fail	-0.3008
-2.8191	failure	-0.2997
-3.2962	fifty	-0.3002
-2.3420	get	-0.2960
-3.2962	hello	-0.3008
-3.2962	hundred	-0.3002
-3.2962	in	-0.2677
-3.2962	increase	-0.3004
-2.9952	information	-0.2677
-2.8191	load	-0.3004
-2.6942	log	-0.2673
-3.2962	machine	-0.2677
-3.2962	main	-0.2997
-2.9952	maintenance	-0.2990
-2.5181	menu	-0.2656
-2.8191	mode	-0.2677
-2.9952	move	-0.3004
-3.2962	off	-0.2677
-2.8191	one	-0.2675
-2.9952	operator	-0.3002
-2.6942	percent	-0.2677
-2.8191	preset	-0.2995
-2.9952	production	-0.2999
-3.2962	quality	-0.3006
-2.9952	rate	-0.2677
-3.2962	redo	-0.2677
-2.9952	reset	-0.2646
-3.2962	running	-0.3008
-3.2962	safe	-0.3004
-2.5973	screen	-0.2982
-2.6942	set	-0.2995
-3.2962	settings	-0.2677
-2.4511	speed	-0.2646
-2.8191	start	-0.2663
-3.2962	state	-0.2677
-2.9952	stop	-0.2665
-2.3420	switch	-0.2968
-3.2962	temperature	-0.2677
-3.2962	ten	-0.3002
-2.8191	the	-0.2999
-2.9952	three	-0.2677
-3.2962	time	-0.2677
-2.1823	to	-0.2968
-3.2962	twenty	-0.3002
-2.9952	two	-0.2677
-3.2962	undo	-0.2677
-2.9952	up	-0.2677

\2-grams:
-0.00181796875	<s> <unk>	0.0
-2.4654	<s> accept	0.0000
-2.4654	<s> cancel	0.0000
-2.4654	<s> clear	0.0000
-2.4654	<s> decrease	0.0000
-2.4654	<s> disable	0.0000
-2.4654	<s> enable	0.0000
-1.5111	<s> get	0.0000
-2.4654	<s> hello	0.0000
-2.4654	<s> increase	0.0000
-1.9883	<s> load	0.0000
-2.4654	<s> menu	-0.2632
-2.1644	<s> move	0.0000
-2.1644	<s> operator	0.0000
-2.4654	<s> redo	0.0000
-2.1644	<s> reset	0.0000
-1.8633	<s> set	0.0000
-1.9883	<s> start	0.0000
-2.1644	<s> stop	0.0000
-1.5623	<s> switch	-0.0458
-2.4654	<s> undo	0.0000
-0.00406796875	<unk> </s>	-0.00117578125
-0.00151640625	<unk> <unk>	0.0
-0.3010	accept </s>	-0.3010
-0.7782	actuator down	0.0000
-0.7782	actuator temperature	0.0000
-0.7782	actuator up	0.0000
-0.6990	belt </s>	-0.3010
-0.5229	belt speed	-0.1963
-0.3010	calibration </s>	-0.3010
-0.3010	cancel </s>	-0.3010
-0.3010	clear failure	-0.1249
-0.3010	conveyor belt	0.0000
-0.3010	decrease the	0.0000
-0.3010	default settings	0.0000
-0.3010	diagnostic information	0.0000
-0.3010	disable </s>	-0.3010
-0.3010	down </s>	-0.3010
-0.3010	enable </s>	-0.3010
-0.3010	fail safe	0.0000
-0.4771	failure log	-0.1761
-0.7782	failure rate	0.0000
-0.3010	fifty percent	0.0000
-1.2553	get actuator	-0.2218
-1.2553	get diagnostic	0.0000
-0.9542	get failure	0.0000
-1.2553	get production	-0.1761
-1.2553	get quality	0.0000
-1.2553	get running	0.0000
-1.2553	get switch	-0.2762
-1.2553	get the	0.0000
-0.3010	hello machine	0.0000
-0.3010	hundred percent	0.0000
-0.3010	in </s>	-0.3010
-0.3010	increase the	0.0000
-0.3010	information </s>	-0.3010
-0.3010	load preset	0.0000
-0.6021	log </s>	-0.3010
-0.9031	log in	0.0000
-0.9031	log off	0.0000
-0.3010	machine </s>	-0.3010
-0.3010	main menu	-0.2218
-0.6021	maintenance menu	-0.2218
-0.6021	maintenance mode	0.0000
-0.7782	menu </s>	-0.3010
-1.0792	menu one	-0.1249
-1.0792	menu three	0.0000
-1.0792	menu two	0.0000
-1.0792	menu up	0.0000
-0.3010	mode </s>	-0.3010
-0.3010	move actuator	-0.1249
-0.3010	off </s>	-0.3010
-0.4771	one </s>	-0.3010
-0.7782	one hundred	0.0000
-0.3010	operator log	-0.1761
-0.3010	percent </s>	-0.3010
-0.7782	preset one	-0.1249
-0.7782	preset three	0.0000
-0.7782	preset two	0.0000
-0.6021	production mode	0.0000
-0.6021	production rate	0.0000
-0.3010	quality information	0.0000
-0.3010	rate </s>	-0.3010
-0.3010	redo </s>	-0.3010
-0.6021	reset </s>	-0.3010
-0.6021	reset to	-0.2840
-0.3010	running time	0.0000
-0.3010	safe mode	0.0000
-0.3010	screen to	-0.1871
-0.3010	set speed	-0.1549
-0.3010	settings </s>	-0.3010
-0.6690	speed </s>	-0.3010
-0.5441	speed to	-0.2285
-0.7782	start </s>	-0.3010
-0.7782	start calibration	0.0000
-0.7782	start conveyor	0.0000
-0.3010	state </s>	-0.3010
-0.6021	stop </s>	-0.3010
-0.6021	stop conveyor	0.0000
-0.5563	switch screen	0.0000
-1.2553	switch state	0.0000
-0.7782	switch to	-0.2285
-0.3010	temperature </s>	-0.3010
-0.3010	ten percent	0.0000
-0.3010	the conveyor	0.0000
-0.3010	three </s>	-0.3010
-0.3010	time </s>	-0.3010
-1.4150	to default	0.0000
-1.4150	to fail	0.0000
-1.4150	to fifty	0.0000
-1.4150	to main	0.0000
-1.1139	to maintenance	0.0000
-0.9379	to menu	-0.1761
-1.4150	to one	-0.2218
-1.4150	to production	-0.1761
-1.4150	to ten	0.0000
-1.4150	to twenty	0.0000
-0.3010	twenty percent	0.0000
-0.3010	two </s>	-0.3010
-0.3010	undo </s>	-0.3010
-0.3010	up </s>	-0.3010

\3-grams:
-0.00508203125	<s> <unk> </s>
-0.0013546875	<s> <unk> <unk>
-0.3010	<s> accept </s>
-0.3010	<s> cancel </s>
-0.3010	<s> clear failure
-0.3010	<s> decrease the
-0.3010	<s> disable </s>
-0.3010	<s> enable </s>
-1.2553	<s> get actuator
-1.2553	<s> get diagnostic
-0.9542	<s> get failure
-1.2553	<s> get production
-1.2553	<s> get quality
-1.2553	<s> get running
-1.2553	<s> get switch
-1.2553	<s> get the
-0.3010	<s> hello machine
-0.3010	<s> increase the
-0.3010	<s> load preset
-0.3010	<s> menu up
-0.3010	<s> move actuator
-0.3010	<s> operator log
-0.3010	<s> redo </s>
-0.6021	<s> reset </s>
-0.6021	<s> reset to
-0.3010	<s> set speed
-0.7782	<s> start </s>
-0.7782	<s> start calibration
-0.7782	<s> start conveyor
-0.6021	<s> stop </s>
-0.6021	<s> stop conveyor
-0.5051	<s> switch screen
-0.7270	<s> switch to
-0.3010	<s> undo </s>
-0.00390625	<unk> <unk> </s>
-0.001554296875	<unk> <unk> <unk>
-0.3010	actuator down </s>
-0.3010	actuator temperature </s>
-0.3010	actuator up </s>
-0.3010	belt speed </s>
-0.3010	clear failure log
-0.6990	conveyor belt </s>
-0.5229	conveyor belt speed
-0.3010	decrease the conveyor
-0.3010	default settings </s>
-0.3010	diagnostic information </s>
-0.3010	fail safe mode
-0.3010	failure log </s>
-0.3010	failure rate </s>
-0.3010	fifty percent </s>
-0.3010	get actuator temperature
-0.3010	get diagnostic information
-0.6021	get failure log
-0.6021	get failure rate
-0.3010	get production rate
-0.3010	get quality information
-0.3010	get running time
-0.3010	get switch state
-0.3010	get the conveyor
-0.3010	hello machine </s>
-0.3010	hundred percent </s>
-0.3010	increase the conveyor
-0.7782	load preset one
-0.7782	load preset three
-0.7782	load preset two
-0.3010	log in </s>
-0.3010	log off </s>
-0.3010	main menu </s>
-0.3010	maintenance menu </s>
-0.3010	maintenance mode </s>
-0.3010	menu one </s>
-0.3010	menu three </s>
-0.3010	menu two </s>
-0.3010	menu up </s>
-0.6021	move actuator down
-0.6021	move actuator up
-0.3010	one hundred percent
-0.6021	operator log in
-0.6021	operator log off
-0.3010	preset one </s>
-0.3010	preset three </s>
-0.3010	preset two </s>
-0.3010	production mode </s>
-0.3010	production rate </s>
-0.3010	quality information </s>
-0.3010	reset to default
-0.3010	running time </s>
-0.3010	safe mode </s>
-1.0000	screen to main
-1.0000	screen to maintenance
-0.5229	screen to menu
-0.3010	set speed to
-0.9031	speed to fifty
-0.9031	speed to one
-0.9031	speed to ten
-0.9031	speed to twenty
-0.3010	start calibration </s>
-0.3010	start conveyor belt
-0.3010	stop conveyor belt
-0.3010	switch screen to
-0.3010	switch state </s>
-0.7782	switch to fail
-0.7782	switch to maintenance
-0.7782	switch to production
-0.3010	ten percent </s>
-0.3010	the conveyor belt
-0.3010	to default settings
-0.3010	to fail safe
-0.3010	to fifty percent
-0.3010	to main menu
-0.6021	to maintenance menu
-0.6021	to maintenance mode
-0.7782	to menu one
-0.7782	to menu three
-0.7782	to menu two
-0.3010	to one hundred
-0.3010	to production mode
-0.3010	to ten percent
-0.3010	to twenty percent
-0.3010	twenty percent </s>

\end\
1 Like

Thanks for this well written post. I am not the expert, maybe @reuben has a better answer than me :slight_smile:

The algorithm is not really meant to find OOV or UNK words, it rather tries to always match something. Therefore, you are trying to hack the system. You could:

  1. Search the forum for posts of people who want to detect just certain words/phrases.

  2. Play with the language model as you are already doing.

  3. Set values for alpha/beta strategically. Basically alpha determines how hard you want to find sth in the custom language model (not much) and beta how to value found words in the custom model (high). More on the alpha/beta here. Try values 0 and 2/3/4?

All the best and let us know how you progress

DeepSpeech never predicts <unk>, it simply does not explore beams that lead to out of vocabulary words.

FWIW, we don’t use this option and KenLM’s documentation explicitly warns against using it: estimation . kenlm . code . Kenneth Heafield

Thank you very much for your replies. So, what I understand in general, DeepSpeech is not suitable for small vocabulary command and control systems? (as I believe OOV rejection should be important part of such system).
@othiele: I already read lot of different posts, but none of them was explicitly focused on OOV rejection. I’ll experiment with the alpha/beta.
@reuben: I’ve read the KenLM estimation page, but the only warning I see there says “The --interpolate_unigrams 0 option emulates SRILM’s behavior but gives large <unk> probability so it should probably not be used.” But as having high weights for <unk> seemed right for my application, I ignored that warning. Beside this, is there any other reason not to use --interpolate_unigrams 0?

If you are into coding, you could change the code to only take beams (paths in the lm) that have a certain percentage or match some other criteria and return UNK otherwise. It is just not how beam search currently works.

That sounds like a good approach, I’ll see what I can do. Once again, thank you for your help :slight_smile:

Several people have already built small vocabulary command and control systems on top of DeepSpeech (and talked about it on this forum), so I don’t know what leads you to this impression.

Could you give more details what problems you ran into with the simplest approach? Basically your first approach, but without --interpolate_unigrams, and maybe use lm_optimizer.py as well to eliminate the possibility of having particularly bad hyperparameters out of bad luck.

Sorry Reuben, it’s just my expectations and understanding of things that was wrong, I didn’t want to be impolite.
I read the related topics, but I would expect an engine that’s suitable for command and control application to have a built-in way to indicate that it can’t recognize a valid command. I.e. if the output of transcription (before being processed and corrected by the language model) can’t be fit into the words inside the vocabulary with certain confidence, it should be discarded as out of vocabulary.
I think I will just use the information I have already described - if the recognized output is just a bag of unigrams instead of a full phrase from the training corpus, I will label it as out of vocabulary.

Also, I wasn’t aware of the method sttWithMetadata() Python method which returns the confidence scores for transcriptions, so it’s helpful for my purposes. Now stuff looks better to me, I do apologize for not noticing earlier.

@reuben I thought the confidence value is only a measure of the acoustic model, so it shouldn’t change for different language models or how do they relate?

Better late than never :slight_smile: For future reference, the confidence score in the metadata reflects the result of the beam search and therefore measures both the acoustic model and the language model.