I’m trying to build a model for demo purpose with a very short vocabulary (7 differents french words: a, b, c, d, e suivant, retour, sauvegarde). In order to do that, I create my own .arpa file and .binary file (with kenlm) then my trie using the same alphabet as the english pre-trained model.
My recordings are mono 16bits and 16kHz.
First time I trained to see if everything was good, it works but due to having not enough data, result were not good.
I record more data and add some and now when I try to train again with a new arpa, binary and trie, I get blank inteference in test, resulting to WER = 100%.
I looked into some topics that said it may be a recording’s format error (not my case) or a character missing in my alphabet. I use checrk_character.py to see if any of them was missing but no.
I tend to think it’s an alphabet problem because this behaviour start after I add some french-CV data and modify the alphabet (and regenerate arpa,binary and trie).
So once I get this error, I came back to my old alphabet and data (+ binary and trie) but the error is still here, which makes me wonder what causes this…
Here my command :
python -u DeepSpeech.py --show_progressbar \ --train_files data/train.csv \ --test_files data/test.csv \ --train_batch_size 1 \ --test_batch_size 1 \ --n_hidden 1024 \ --epochs 1 \ --checkpoint_dir .. \ --export_dir .. \ --summary_dir .. \ --lm_binary_path ../lm.binary \ --lm_trie_path ../trie \ --alphabet_config_path data/alphabet.txt \
and here what kind of output I get :
WER: 1.000000, CER: 1.000000, loss: 4.078539 - src: "a" - res: ""
My question is: what can be the cause of this behaviour ?
I redo multiple times all the steps, in order to be sure that I didn’t mess somewhere down the road, but still the same issue… And It’s not an environment problem cause with a similar project but with other data (all month in french), it works well.
I work on this since 2 days, my brain is stuck and I can’t find the cause so any help is greatly welcome !
Thank you very much
If you need more information, ask me
My vocabulary.txt file :
a b c d e suivant retour sauvegarde
My alphabet.txt file :
# Each line in this file represents the Unicode codepoint (UTF-8 encoded) # associated with a numeric label. # A line that starts with # is a comment. You can escape it with \# if you wish # to use '#' as a label. a b c d e f g h i j k l m n o p q r s t u v w x y z ' # The last (non-comment) line needs to end with a newline.
I check my recording’s format with audacity (that’s the tool I use to create my recordings), may be an other way to check format is better ?
Note that even when I remove lm and trie from parameters, it still returns blank res