Quality of the CTC decoder / limit inference words to known words only


(Vincent Foucault) #1

Hello all,

I’m using a homemade model, for french language, files are ok.

My mic array records in continuous, and inference is done when a vader function cut.

  • When I talk, without noise, inference is very good, but…

  • with TV or other noise, inference produce anything like this :

    le a eaefanke eethe
    It doesn’t correspond to any word in my vocabulary.txt

my question :

How could I restrict inference to known words ? (and forget others)

Thanks all.


(Francob) #2

Hi, if you check out the hyperparameters and increase LM_WEIGHT and VALID_WORD_COUNT_WEIGHT then you should get better results.


(Vincent Foucault) #3

Hi Francob.
Thanks, I’ll try it…
PS : what hyperparams do you use ?


(Francob) #4

5 for LM_WEIGHT and 3 for VALID_WORD_COUNT_WEIGHT, haven’t optimized them yet though


(Deepak Banka) #5

Hey Francob, How can we optimize LM_WEIGHT , WORD_COUNT_WEIGHT and VALID_WORD_COUNT_WEIGHT


(Vincent Foucault) #6

Hi. They are params !
set it like this :

-- WORD_COUNT_WEIGHT = 5 \
...