Good test.csv results but zero output with interface?

Hello! I am training model to detect 9 commands in Finnish with dataset of 2200 words (train 70%, dev 20%, test10%). Words are about 1sec audio clips of given word.

Training config:

Snippet from train.csv run at the end of training:

So it is detectin test files with good accuracy. But when I run an interface with short audio clip with command in it, output is empy.

Same with mic_vad_streaming as it detects voice but output is empty.

What’s wrong?
Thanks for helping!

  1. Use dropout, maybe 0.3
  2. What about the scorer, you don’t need it for training, but lm_binary is outdated?
  3. What’s in your alphabet, words?
  4. How did you come up with n_hidden?
  5. What’s the difference between your test audios and the ones without output? Check the audio.
  1. Ok, I’ll add that
  2. generate lm:

[quote]python3 generate_lm.py --input_txt ./commands.txt --output_dir .
–top_k 26 --kenlm_bins /home/tuomas/Desktop/DeepSpeech-0.9.1/kenlm/bin
–arpa_order 3 --max_arpa_memory “85%” --arpa_prune “0|0|1”
–binary_a_bits 255 --binary_q_bits 8 --binary_type trie
–discount_fallback[/quote]

generate scorer:

[quote]./generate_scorer_package --alphabet commands.txt --lm lm.binary --vocab vocab-26.txt
–package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284 --force_bytes_output_mode true[/quote]

Tried with “native_client.amd64.cuda.linux” and “native_client.amd64.cpu.linux” for scorer

  1. commands.txt:

[quote]######

aika
apua
kello
sää
viesti
soita
musiikki
kyllä
ei
a
i
k
p
u
k
e
l
o
s
ä
v
t
m
y
l
########[/quote]

Includes every word and every letter in those words

  1. Trial and error

  2. I used same file for both

If your model doesn’t work for new material you might have overfitting or not enough training. How many epochs did you train? You don’t have much material.

Your scorer should have all words combinations you want to detect. Leave out all parameters you can as they are for GBs of data. You are pruning too much.

Your alphabet should contain letters to recognize. I don’t know Finnish but it looks like you have words in there too. Might be problematic.

Sorry, what do you mean by overfitting? Latest try I did 66 ephocs.

EDIT: I had wrong parameters while creating scorer. Training and update soon.

EDIT2: I created separeate vocab.txt and alphabet.txt files but still no success. Still empty output.

Empty output with many epochs on a smaller set is a bit strange. I would guess you get results with 15-20 epochs as you have few data. Check again with the new settings, chances are good.

Maybe use a smaller batch size, but that should only worsen the training by a bit.

Tried with batch size 24. Still no luck. Running out of ideas. :confused:

It has to be scorer if train.csv gives results in training, right?

Train loss is 0.4 but dev loss is 20. Could that point to the issue? Should there be some output still? Then again I think that at some earlier longer train dev loss got close to 2 and there was no output either.

Loss in itself is not that important for such a small amount of material. It is rather how they relate to each other. Typically they get closer for a while then test gets better and dev doesn’t. Then you are overfitting.

There is some material from Common Voice/Mozilla for just numbers. Maybe take that for English to get a feel how much you have to train how long. Maybe you just don’t have enough material.

UPDATE: Got some output. Not 100% accurate but still. Did this by running the interface command without --scorer.

Dont know what I did wrong building the scorer but good to know where the issue is.

1 Like

Good point. Running without the scorer will give you the “raw” letter output from the audio model.

Again, get a scorer with the least amount of parameters possible and the text file should contain all the words you want to recognize.

UPDATE2: In my vocab.txt file words were separeted by line change. Space is the right way. IT WORKS! Thanks you so much for your time othiele.

2 Likes