How can I know what boost value to give for a particular hot word

Chagari_Vamsi_Reddy · January 15, 2021, 8:57am

Hi,

How can I know what boost value to give for a particular hot word.
Example, let’s say i want to give willson as a hotword, what would be the boost value for that?. Any help would be greatly appreciated.

More info:
I used client.py (native_client/python/client.py) to test the Deepspeech by providing an audio file along with the hot word plus a boost value as well. Deep speech couldn’t able to decode the audio correctly, Audio file is 3 seconds which has “Fiona walker” as the transcription. I tried with the various boost values (Ex: 0, 1, 2, -0.5, -1, 0.2) and there is no impact on the result/outcome, The output is same in both the cases ( with or without the hotword).

Question:
I am wondering how to know what boost value to give for a particular word?, how do I calculate that at run time so that I can give the correct boost value, which would help decoder to provide accurate results?.

How I ran Deepspeech on the setup:
python3 client.py --model deepspeech_data/deepspeech-0.9.3-models.pbmm --scorer deepspeech_data/deepspeech-0.9.3-models.scorer --audio test.wav --hot_words “fiona walker:1.5”

output:
DeepSpeech: v0.9.3-0-gf2e9c85
2021-01-14 21:21:51.170266: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-14 21:21:51.171547: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-01-14 21:21:51.194164: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-14 21:21:51.194496: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 48 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 462.00GiB/s
2021-01-14 21:21:51.194507: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-01-14 21:21:51.195499: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-01-14 21:21:51.196457: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-01-14 21:21:51.196619: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-01-14 21:21:51.197642: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-01-14 21:21:51.198195: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-01-14 21:21:51.200299: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-01-14 21:21:51.200360: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-14 21:21:51.200660: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-14 21:21:51.200908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-14 21:21:51.397930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-14 21:21:51.397966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-01-14 21:21:51.397970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-01-14 21:21:51.398103: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-14 21:21:51.398386: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-14 21:21:51.398640: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-14 21:21:51.398889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7033 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
Loaded model in 0.291s.
Loading scorer from files deepspeech-0.9.3-models.scorer
Loaded scorer in 0.000138s.
Adding hot-words
Running inference.
2021-01-14 21:21:51.481202: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
frontier
Inference took 0.443s for 2.950s audio file.

Code related info:
I see that the boost value is been used in the “void DecoderState::next(const double *probs, int time_dim, int class_dim)” function to improve the score for the corresponding word
piece of code: score = ( ext_scorer_->get_log_cond_prob(ngram, bos) + hot_boost ) * ext_scorer_->alpha;

Thank you

othiele · January 15, 2021, 8:30am

Why don’t you just search a bit in the forum? The post before yours has a nice overview.

Chagari_Vamsi_Reddy · January 15, 2021, 8:52am

Hello,

I did saw this post! and read the doc as well!, it wasn’t clear to me, regarding what boost value to give for a particular word. If you are clear on this, could you enlighten me pls??.

FI, I also did good amount of tests before posting this question over here.

Thanks

othiele · January 15, 2021, 9:38am

Thanks for adding all that information that you didn’t give at first. This feature is quite new and not used that much. If I remember correctly you would rather use values like 10 or 20. Why don’t you ask the guys from the other post what worked form them.

I am still unsure what you want to do. If your language model has some totally different results, you would need a higher boost value. So hard to say what is best without knowing what you are up to.

Clockworker · January 15, 2021, 1:05pm

It depends on how uncertain model is about a word. You need to add up much more than 2.0. Somewhere around 15.0 would be okay, but it really depends on your audio file. If you add up too much your next word (even if it’s the last word in a sequence) after a detected hot-word would become a total mess (letter splitting).

Don’t use “{word1}{space}{word2}” as this doesn’t influence anything. Some proper nouns may not work correctly (if they are not prevalent), so be careful about them too.

Our report had a script for testing hot-words, you can use it to check how your transcription will behave with hot-words and their prio. (You can download it, it is near the bottom of the doc, under “Verification” paragraph)

Chagari_Vamsi_Reddy · January 15, 2021, 5:59pm

You are welcome and thank you for the response, I understand that this feature is new.

Here is what I am planning to do, but would need experts help to know few things. During the run time how would I know the probability of a particular ngram (word) given by the language model?, is there any api that gives this information (c++/python api). Once I know the probability, it might be easy to know what boost value to provide for the corresponding word in the hot-word list.
Let’s say I have list of words (ex: 10 nouns), is there a way to get the likelihood probability of these 10 nouns from a language model??.

Chagari_Vamsi_Reddy · January 15, 2021, 6:18pm

Thank you for the response. I even did few tests to understand how providing the boost value impacting the decoder/Deepspeech outcome/result. I also noticed similar behavior in my tests, for some audio files with the hot-words and boost value around 15, decoder can able to decode a word (noun) fine, but If I increase it to 20 or so, it will spoil the following word in an audio (as you guys said in your report).
Thank you, I will look at the script, but I am not planning to test this anymore, without knowing what boost value to give for a particular word. In the run time, I don’t want to give 10-20 different boost values for an each hot-word to the decoder and wait for the better/accurate results, hence looking for a mechanism to calculate/know the likelihood probability of a particular word (usually given by language model).