TUTORIAL : How I trained a specific french model to control my robot

Shoot, realized my command line was not correct. it was supposed to be, with ,
jugs@jugs:~/PycharmProjects/DeepSpeech/native_client/kenlm/build$ bin/lmplz -o 5 ~/Desktop/jugs_lm/vocabulary.txt ~/Desktop/jugs_lm/out.arpa

however, I Still have this error,
=== 1/5 Counting and sorting n-grams ===
Reading /home/jugs/Desktop/jugs_lm/vocabulary.txt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 56 types 38
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:456 2:2294630656 3:4302432768
/home/jugs/PycharmProjects/DeepSpeech/native_client/kenlm/lm/builder/adjust_counts.cc:60 in void lm::builder::{anonymous}::StatCollector::CalculateDiscounts(const lm::builder::DiscountConfig&) threw BadDiscountException becausediscounts_[i].amount[j] < 0.0 || discounts_[i].amount[j] > j’.ERROR: 1-gram discount out of range for adjusted count 3: -3.6285715Aborted (core dumped)`

Nevermind, Figure out to avoid the error.

bin/lmplz -o 3 </data/vocabulart.txt> /data/words.arpa --discount_fallback 1

I don’t know exactly what “discount_fallback” does, but one can check in --help.

Could you try with a bit more sentences ?
I had errors before, with too few sentences

It already worked with “–discount_fallback” parameter on kenlm command.
Thanks @elpimous_robot

1 Like

How do i vary the parameter values if i have around 100k audio files? Which parameter values from here will change?

Hi @p.holetzky,

I am trying to create a German model. I have 10 minutes of each audio files with the transcript. But I am not sure if I can use it as it’s long. If this is not possible then I need to split all my audio files for 10 seconds each, but splitting the transcript is painful I believe. I see you used some open source corpus. It will be great if you could share some information about the training data.

Thanks :slight_smile:

hi,
You should work with nearly 5seconds sentences.
Please help all of us, feeding “Common Voice” with your own voice, and validating existing samples.
Best.
Vincent

Hi, great work!
@elpimous_robot during your tests, after your finished training your model, does it happen that Deepspeech doesn’t respect your language model (words from your vocabulary file)?

Hi, @BadrEL,
Thanks.
Could you be more explicit ?
exemples…
Waiting for U
Vincent

After training a French model, sometimes when I say “bonjour” it returns “bojoujouur”.
The word “bojoujouur” doesn’t belong to the used vocabulary (language model).
Does this kind issue happened for you?

Ok.
It only means that your model is too poor !!!
With a better model accuracy, you’ll have quite better inferences !

exemples of my model :


Source ----> axel adore jouer avec toi
Inference -> axel adore jouer avec toi
Inference took 1.240s for 2.700s audio and a 0.0 % WER.
---------------------------
Source ----> les robots ne jouent pas en général
Inference -> les robots de jouent pas en général
Inference took 1.254s for 2.520s audio and a 14.2857142857 % WER.
---------------------------
Source ----> nous aimons jouer avec toi
Inference -> nous aimons jouer avec toi
Inference took 1.079s for 2.160s audio and a 0.0 % WER.
---------------------------
Source ----> j’aime travailler avec toi
Inference -> j’aime trailer avec toi
Inference took 1.081s for 2.310s audio and a 25.0 % WER.
---------------------------
Source ----> j’aime discuter avec toi
Inference -> j’aime discuter avec toi
Inference took 1.113s for 2.280s audio and a 0.0 % WER.
---------------------------
Source ----> j’aime beaucoup bavarder avec toi
Inference -> j’aime beaucoup bavarder avec toi
Inference took 1.349s for 2.790s audio and a 0.0 % WER.
---------------------------
Source ----> aimes-tu parler avec moi
Inference -> aimes-tu parler avec moi
Inference took 1.023s for 2.130s audio and a 0.0 % WER.
---------------------------
Source ----> es-tu capable de chanter avec moi
Inference -> es-tu capable de chanter avec moi
Inference took 1.281s for 2.610s audio and a 0.0 % WER.
---------------------------
Source ----> peux-tu rechercher sur wikipédia une information
Inference -> peux-tu rechercher sur wikipédia une information
Inference took 1.914s for 3.780s audio and a 0.0 % WER.
---------------------------

Try to improve your hyperparams, and/or find more wav samples !

Good luck

ok I see! thx
I thought that Deepspeech will try to find the closest word in the language model that will satisfy the content of the wav file.
I’ll try to use a richest model.

Ready to help, if needed !

I have problem
this error ==>
terminate called after throwing and instance of lm::formatloadexception …
the binary file was build for probing hash tables but the inference …

please help me

deepspeech v.3
tenserflow 1.11
python 3
latest knml

Same here,

deepspeech 0.4.0-alpha.0
tf 1.12
python 3.6
latest knml

W Parameter --validation_step needs to be >0 for early stopping to work
terminate called after throwing an instance of ‘lm::FormatLoadException’
what(): native_client/kenlm/lm/binary_format.cc:131 in void lm::ngram::MatchCheck(lm::ngram::ModelType, unsigned int, const lm::ngram::Parameters&) threw FormatLoadException.
The binary file was built for probing hash tables but the inference code is trying to load trie with quantization and array-compressed pointers
Aborted (core dumped)

but when

$ …/…/kenlm/build/bin/build_binary -q 8 -a 255 trie lm.arpa lm.binary

The error move to version :

W Parameter --validation_step needs to be >0 for early stopping to work
Error: Trie file version mismatch (3 instead of expected 2). Update your trie file.

How do I use mozilla/voice-corpus-tool?
I try some some cmd like
python3 voice.py add ‘/home/billy/Downloads/voice*.wav’ skip 2 take 3 compr 500 rate 8000 augment ‘/home/billy/Downloads/http_generated_18-11-26_012212.wav’ play

it can work, but when i add some option in augment it can’t

python3 voice.py add ‘/home/billy/Downloads/voice*.wav’ skip 2 take 3 compr 500 rate 8000 augment ‘/home/billy/Downloads/http_generated_18-11-26_012212.wav’ [-gain [-8]] play

hi
i am trying to fine tune (train from the checkpoint) the deepspeech 0.3.0 model with english conversational speech data. i have a doubt that, do i need to regenerate the language model and trie file on fine tuning the model or it will auto update with the new data which im using to train (tune) the model.
thank you.

The language model and trie are completely separate from the acoustic model, they’re only used for WER reports during test epochs and for inference with the clients. You’ll have to recreate them if you want to change them.

thank you for your quick response reuben,
actually i’m getting some inaccurate results on inference of conversational audio and some other audio files with little bit of noisy environment. sometimes the words are used to merge and produce spelling mistakes.
like " andhecametothepartyatheevening", sometimes spelling mistakes. so i decided to fine tune the model with conversational data (with noise) so that it may perform well. i have two doubts …

  1. for the above problem what i have to do exactly either
    i. tune the model (acoustic model)
    ii. work around improving language model and trie file
  2. if your suggestion is tuning the model then i don’t have to touch the language model and trie file right…
    (or)
    if your suggestion is work around improving language model and trie file, do i have to create it from scratch or i can tune it too…
    (or)
    do i have to create a new language model and trie file every time i’m tuning / retrain the model…if yes how can i pass/get the same knowledge of the deepspeech langauge model to my newly created language model

thank you…

sorry i’m very new to machine learning (just 2 months)… please don’t mind my foolish questions (if any) :slight_smile:

1 Like

i will be very happy if someone can comment on the above question…
thanks… : )