Tune MoziilaDeepSpeech to recognize specific sentences

Sudarshan.gurav14 · December 10, 2019, 7:05am

hii @nmstoker I am not able to analyze loss
e.g
Epoch 1 | Training | Elapsed Time: 1 day, 23:54:37 | Steps: 2528 | Loss: 100.6082

and starting loss is around 178
how to calculate loss ?

and another question is:
I am train a model but not give --epochs parameter so,is any default value for epochs

Thank you

MattC_eostar · December 11, 2019, 9:10pm

I followed these instructions, but I am noticing some strange behavior (it sort of makes sense, but I want to mitigate it). I am trying to use deepspeech with a very small number of commands. I created a corpus that uses these command phrases which include sequences of numbers and then created the custom trie and lm.binary files.

The LM works and increases the accuracy of the model for my use case. The strange behavior is that the model becomes very bad at ignoring OOV words. Instead of classifying it as an , it seems to be forcing things into a bucket.

For example, I created a LM that focuses mostly on numbers and then as a smoke test, I passed it audio from LibriSpeech which may include numbers sometimes; it is mostly other words.

The output of that is:

two nine one two 
one one ten one
five ten four seven
three one

Is there a way to check confidences to manually ignore, or can I set this up differently to better ignore them by means of the language model? Thanks!

subhash · February 22, 2020, 1:27pm

I am not sure if I am fully right here, but you can configure a fixed word for oov cases and handle it during recognition of oov words.

You can check the kenlm documentation for sa me. After you set this, your words.arpa file will have it with “unk” tag.

Hope it helps. Once I get access to laptop I can Google and read more and give you better answer.

amunje · March 11, 2020, 6:43am

I have a similar but slightly different issue. I have a list of 100 short phrases (2-5 words). I want to restrict recognition to only one of these phrases, and nothing else, not even single words from these phrases.

I understand this might be tricky in microphone streaming case, as the phrase boundary is more fluid there, so am trying only by passing the full wav file of the phrase.

I tried deleting the 1-grams manually from LM, but I get a run-time error that all words need to be in 1-grams. Is there a better way to restrict all responses strictly to only one of the phrases, thereby improving the accuracy.

Regards,

PS: Please keep up the great work you are doing on this project.

kamil_BENTOUNES · April 1, 2020, 11:56am

Hello @nmstoker and thank you for this sharing…

I have to create a french DeepSpeech model… Have you some ideas to share with me ?

I would to start following these steps using this french database which contains MP3 audio files only (I guess I can convert them after on wav)…

Could you help me please ?

Thanks

nmstoker · April 1, 2020, 1:24pm

@kamil_BENTOUNES welcome to the forum! I hope you are well.

Are there any specific points you want help with?

The documentation in the repo itself is probably the best place to start - I appreciate that those French instructions may be easier for you to follow but it looks like it’s a relatively old post so you may run into problems where they differ from what’s in the latest repo (or version 0.6 which is the last one with a model released, although that’s an English model).

I’d also suggest looking over the forum for others working on French language models (I’m pretty sure there are some people involved in that already)

kamil_BENTOUNES · April 1, 2020, 2:21pm

Hi @nmstoker, I’m fine thank you, and hope you do too.

Thank you for your quick response. Actually I found your idea very interesting in the sense that it’s exactly what I want to do (I tested word recognition models but I am not too convinced of the result despite that it is good enough) … I’ve found this French model on which I am downloading it (I’m on low speed connexion LOL)… If I find that it works quite well I would like to personalize it as you did with French words. Suddenly I had a question which may seem odd to you but you have used how much audio data for the re-training on desired words? because I looked at your description and at no time you speak about that… Does it work differently?

Thank’s a lot Neil !

nmstoker · April 1, 2020, 2:37pm

I’m working so have to be brief, but the process I described above is purely on the language model, it doesn’t require changes to the acoustic model (ie the part that operates on audio data).

It works because the acoustic model is already able to recognise basic sounds and then the LM is being used to restrict the words that it guesses so that they are only the ones in the shortlist.

If you don’t get great results then you would want to look at using audio (but you would need a large amount) and that’s called fine-tuning (see repo docs for comments on how that’s done).

Do take care to ensure you install the same version/checkpoint of DeepSpeech as the model was trained on and to refer to the matching docs too (both are a common source of misunderstanding and problems!)

kamil_BENTOUNES · April 1, 2020, 2:52pm

Ah I understand perfectly now! Thank you very much and sorry for disturbing… Pay attention to yourself during this health crisis and thank you again !

nmstoker · April 1, 2020, 11:56pm

No problem at all.

I hope you’re doing okay at this time too. Best of luck with the model. Would be interesting to hear how you get on with it in due course

kamil_BENTOUNES · April 14, 2020, 10:31am

sorry i didn’t see your answer. I’ve configured/generated the language model files so that the model will be able to recognize a list of 9 French words. it worked very well (except with the word “banane”). I don’t know if it’s due to the model that I used which is not very efficient or when the word banana is considered as noise because of its phonems unvoiced sounds.

Ajay_Ganesan · May 24, 2020, 5:24am

hi, can you please tell where to download the native_client tar file?. i wanted to infer in CPU with v0.5.0. i am looking for generate_trie. please provide the link.

othiele · May 24, 2020, 7:55am

As far as I remember old native clients are no longer available due to some technical changes. You will have to build it yourself.

Ajay_Ganesan · May 24, 2020, 10:10am

thank you for the reply. i will build it myself

Ajay_Ganesan · May 25, 2020, 8:18pm

hi, thanks for the wonderful tutorial. i trained a language model for only 12 words, but when i used with the pretrained model for most of the words the output is empty. i want to predict only those 12 words. how to improve the accuracy of language model?. if i dont use the language model output is gibberish.

nmstoker · May 25, 2020, 11:18pm

Hi @Ajay_Ganesan - this might be hard to diagnose.
I would start with confirming that it isn’t a recording quality or accent issue by trying to find the words in your list of 12 in another source, ideally one where they’re said by people with a US accent (as the majority of the acoustic model training has been done with US accented speakers). That would at least give a sense about whether it is equally challenged by those samples or not.

Can I check also if you’ve stuck with an older version of DeepSpeech - your comments above are asking about 0.5.0 and I’m guessing perhaps you stuck with that to be able to follow the steps in the tutorial. Generally as there have been some significant improvements I’d suggest trying to use 0.7.1 - I realise you will have to make some changes to the steps as the handling of the LM has changed a bit but the principals are pretty similar (if anything it’s easier now and it’s well documented).

Switching won’t necessarily help if it’s an accent thing, as I believe the model is still stronger with US accents - it does pretty well with my UK accent but there are areas where the accent seems like it’s struggling for me too. If you were in that situation then the way forward would be to look at fine tuning the model but you’d need a decent amount of audio transcribed and I’d try to narrow down what the issue is before going down that route.

Hope that helps? Good luck!

Yugandhar_Gantala · September 15, 2020, 4:19pm

Hi @nmstoker, this is very helpful but while generating lm.binary and other output files, I am getting error. Please help me generate lm.binary

usage: ipykernel_launcher.py [-h] --vocabulary.txt VOCABULARY.TXT --output_dir OUTPUT_DIR --top_k TOP_K --kenlm_bins
KENLM_BINS --arpa_order ARPA_ORDER --max_arpa_memory MAX_ARPA_MEMORY --arpa_prune
ARPA_PRUNE --binary_a_bits BINARY_A_BITS --binary_q_bits BINARY_Q_BITS --binary_type
BINARY_TYPE [–discount_fallback]
ipykernel_launcher.py: error: the following arguments are required: --vocabulary.txt, --output_dir, --top_k, --kenlm_bins, --arpa_order, --max_arpa_memory, --arpa_prune, --binary_a_bits, --binary_q_bits, --binary_type

An exception has occurred, use %tb to see the full traceback.
SystemExit: 2

nmstoker · September 15, 2020, 6:12pm

@Yugandhar_Gantala your post doesn’t seem to have enough information to investigate further. Can you give a bit more detail on what you’re actually doing, versions, environment etc. It looks like you’ve called some code and you haven’t passed the parameters.

Imagine I can’t see what you’re doing (because I cannot )

othiele · September 16, 2020, 9:18am

Please search the forum and if you post give us more to work on. There are several posts on building the scorer.

sujithvoona2 · September 16, 2020, 10:08am

Hey @nmstoker,
I have created my own vocabulary.txt file and I want to train deepspeech 0.7.3 pretrained model on my own vocabulary. I did follow the steps you mentioned above. Now I am trying to generate the output files (lm.binary, warpa.words, trie), while generating I am getting an error that the arguments are required “–vocabulary.txt, --output_dir, --top_k, --kenlm_bins, --arpa_order, --max_arpa_memory, --arpa_prune, --binary_a_bits, --binary_q_bits, --binary_type”.
What is generate_trie for? We will be getting an output file trie right, isn’t that enough to train the model on vocabulary.txt