How to indicate if sentence has trained word

eugene.endorse.gg · July 12, 2019, 1:07am

Hi,

I’m wondering if there’s a way to use DeepSpeech to analyze an audio file and see if a word of interest is included or not. This is similar to the problem: How to classify unknown words, how to ignore words. However, in this case I want to see if a word exists in a sentence. An example of this will be:

file1.wav - “this wav file does not have the word of interest”
file2.wav - “this wav file has the word of interest which is gfuel”

The caveat is that I’m planning to train only the word of interest (“gfuel”) and not train any other words in the audio file. The language model and the audio model will only include the trained word. I’m not sure if I’m misunderstanding this but I believe that there is some type of “threshold” in which it can recognize words, otherwise, deespeech will not output anything at all:

(desired outcome)
file1.wav - outputs: “”
file2.wav - outputs: “gfuel”

Another way is I can obtain confidence score through --json flag introduced in 0.5.1 in the metadata field and filter our sentences on a given threshold.

However, both of these methods don’t work. The first method creates incorrect inferences because of the limit language model therefore it will print “gfuel” b/c this is the only word I have trained on. The training data set looks like this:

wav_filename,wav_filesize,transcript
/root/speech/data/gfuel_custom/1.wav,241708,gfuel
/root/speech/data/gfuel_custom/2.wav,1139352,gfuel gfuel gfuel gfuel gfuel
/root/speech/data/gfuel_custom/3.wav,258092,gfuel
/root/speech/data/gfuel_custom/4.wav,462892,gfuel gfuel
/root/speech/data/gfuel_custom/5.wav,688172,gfuel gfuel gfuel
/root/speech/data/gfuel_custom/6.wav,227372,gfuel
/root/speech/data/gfuel_custom/7.wav,274476,gfuel
/root/speech/data/gfuel_custom/8.wav,376876,gfuel
/root/speech/data/gfuel_custom/9.wav,745516,gfuel gfuel gfuel gfuel
/root/speech/data/gfuel_custom/10.wav,274476,gfuel
/root/speech/data/gfuel_custom/11.wav,167980,gfuel

When running this, I get the following:

root@speech:~/speech/data/gfuel_custom# deepspeech --model output_graph.pb --alphabet alphabet.txt --lm lm.binary --trie trie --audio test1.wav
…
gfuel
Inference took 0.419s for 15.440s audio file.

I have generated the language model by following these steps:

…/…/kenlm/build/bin/lmplz --text transcript.txt --arpa words.arpa --o 3 --discount_fallback
…/…/kenlm/build/bin/build_binary -T -s words.arpa lm.binary
…/…/native_client/generate_trie alphabet.txt lm.binary trie

Here are the parameters that I have used to train the audio dataset:

python3 -u DeepSpeech.py --noshow_progressbar
–train_files …/data/gfuel_custom/gfuel_custom.csv
–test_files …/data/gfuel_custom/gfuel_custom.csv
–train_batch_size 10
–dev_batch_size 10
–test_batch_size 5
–n_hidden 375
–epochs 33
–validation_step 1
–early_stop True
–earlystop_nsteps 6
–estop_mean_thresh 0.1
–estop_std_thresh 0.1
–dropout_rate 0.22
–learning_rate 0.00095
–report_count 100
–use_seq_length False
–checkpoint_dir …/data/gfuel_custom/checkpoint
–alphabet_config_path …/data/gfuel_custom/alphabet.txt
–lm_binary_path …/data/gfuel_custom/lm.binary
–lm_trie_path …/data/gfuel_custom/trie
–export_dir …/data/gfuel_custom/

I have played around with the parameters n_hidden, epochs since I’ve read somewhere this can be attributed to over fitting however it’s not producing any effect.

I have tried grabbing the confidence metadata (I had to download/install native_client), however, i’m getting inconsistent scoring that has no merit on what’s on the audio.

file1.wav
{“metadata”:{“confidence”:27.3069},“words”:[{“word”:“gfuel”,“time”:0.02,“duration”:5.56}]}

file2.wav
{“metadata”:{“confidence”:27.6635},“words”:[{“word”:“gfuel”,“time”:0.02,“duration”:7.48}]}

I have pretty much followed every step in Tune MoziilaDeepSpeech to recognize specific sentences and TUTORIAL : How I trained a specific french model to control my robot but coming into a conclusion that deepspeech cannot provide indication if word is in audio sentence and can only transcribe on best match on their available language model.

I would greatly appreciate if someone can confirm this or point me in the right direction.

Coding session: https://www.twitch.tv/videos/451440201?t=01h01m43s

elpimous_robot · July 12, 2019, 7:15am

Hello. An easy way, if you have cpu power, is to compare the presence of each word in sentence, to the content of the whole vocabulary. (if you have it…)

eugene.endorse.gg · July 12, 2019, 9:39pm

I don’t have any trained data on the other words. Since I’m training only on one word it will not be able to interpret the other words in the sentence.

Topic		Replies	Views
How to classify unknown words, how to ignore words DeepSpeech	9	2605	January 16, 2018
Handling missing words in transcript? DeepSpeech	7	1246	July 15, 2020
Word prediction based on vocabulary DeepSpeech	9	536	December 24, 2019
How to check pre trained model lexicons? DeepSpeech	0	336	May 11, 2020
Obtain per-word confidence score DeepSpeech	1	1035	September 12, 2019

How to indicate if sentence has trained word

Related topics