I followed these instructions, but I am noticing some strange behavior (it sort of makes sense, but I want to mitigate it). I am trying to use deepspeech with a very small number of commands. I created a corpus that uses these command phrases which include sequences of numbers and then created the custom trie and lm.binary files.
The LM works and increases the accuracy of the model for my use case. The strange behavior is that the model becomes very bad at ignoring OOV words. Instead of classifying it as an , it seems to be forcing things into a bucket.
For example, I created a LM that focuses mostly on numbers and then as a smoke test, I passed it audio from LibriSpeech which may include numbers sometimes; it is mostly other words.
The output of that is:
two nine one two
one one ten one
five ten four seven
three one
Is there a way to check confidences to manually ignore, or can I set this up differently to better ignore them by means of the language model? Thanks!
I have a similar but slightly different issue. I have a list of 100 short phrases (2-5 words). I want to restrict recognition to only one of these phrases, and nothing else, not even single words from these phrases.
I understand this might be tricky in microphone streaming case, as the phrase boundary is more fluid there, so am trying only by passing the full wav file of the phrase.
I tried deleting the 1-grams manually from LM, but I get a run-time error that all words need to be in 1-grams. Is there a better way to restrict all responses strictly to only one of the phrases, thereby improving the accuracy.
Regards,
PS: Please keep up the great work you are doing on this project.
The documentation in the repo itself is probably the best place to start - I appreciate that those French instructions may be easier for you to follow but it looks like itās a relatively old post so you may run into problems where they differ from whatās in the latest repo (or version 0.6 which is the last one with a model released, although thatās an English model).
Iād also suggest looking over the forum for others working on French language models (Iām pretty sure there are some people involved in that already)
Hi @nmstoker, Iām fine thank you, and hope you do too.
Thank you for your quick response. Actually I found your idea very interesting in the sense that itās exactly what I want to do (I tested word recognition models but I am not too convinced of the result despite that it is good enough) ⦠Iāve found this French model on which I am downloading it (Iām on low speed connexion LOL)⦠If I find that it works quite well I would like to personalize it as you did with French words. Suddenly I had a question which may seem odd to you but you have used how much audio data for the re-training on desired words? because I looked at your description and at no time you speak about that⦠Does it work differently?
Iām working so have to be brief, but the process I described above is purely on the language model, it doesnāt require changes to the acoustic model (ie the part that operates on audio data).
It works because the acoustic model is already able to recognise basic sounds and then the LM is being used to restrict the words that it guesses so that they are only the ones in the shortlist.
If you donāt get great results then you would want to look at using audio (but you would need a large amount) and thatās called fine-tuning (see repo docs for comments on how thatās done).
Do take care to ensure you install the same version/checkpoint of DeepSpeech as the model was trained on and to refer to the matching docs too (both are a common source of misunderstanding and problems!)
Ah I understand perfectly now! Thank you very much and sorry for disturbing⦠Pay attention to yourself during this health crisis and thank you again !
sorry i didnāt see your answer. Iāve configured/generated the language model files so that the model will be able to recognize a list of 9 French words. it worked very well (except with the word ābananeā). I donāt know if itās due to the model that I used which is not very efficient or when the word banana is considered as noise because of its phonems unvoiced sounds.
hi, can you please tell where to download the native_client tar file?. i wanted to infer in CPU with v0.5.0. i am looking for generate_trie. please provide the link.
hi, thanks for the wonderful tutorial. i trained a language model for only 12 words, but when i used with the pretrained model for most of the words the output is empty. i want to predict only those 12 words. how to improve the accuracy of language model?. if i dont use the language model output is gibberish.
Hi @Ajay_Ganesan - this might be hard to diagnose.
I would start with confirming that it isnāt a recording quality or accent issue by trying to find the words in your list of 12 in another source, ideally one where theyāre said by people with a US accent (as the majority of the acoustic model training has been done with US accented speakers). That would at least give a sense about whether it is equally challenged by those samples or not.
Can I check also if youāve stuck with an older version of DeepSpeech - your comments above are asking about 0.5.0 and Iām guessing perhaps you stuck with that to be able to follow the steps in the tutorial. Generally as there have been some significant improvements Iād suggest trying to use 0.7.1 - I realise you will have to make some changes to the steps as the handling of the LM has changed a bit but the principals are pretty similar (if anything itās easier now and itās well documented).
Switching wonāt necessarily help if itās an accent thing, as I believe the model is still stronger with US accents - it does pretty well with my UK accent but there are areas where the accent seems like itās struggling for me too. If you were in that situation then the way forward would be to look at fine tuning the model but youād need a decent amount of audio transcribed and Iād try to narrow down what the issue is before going down that route.
@Yugandhar_Gantala your post doesnāt seem to have enough information to investigate further. Can you give a bit more detail on what youāre actually doing, versions, environment etc. It looks like youāve called some code and you havenāt passed the parameters.
Imagine I canāt see what youāre doing (because I cannot )
Hey @nmstoker,
I have created my own vocabulary.txt file and I want to train deepspeech 0.7.3 pretrained model on my own vocabulary. I did follow the steps you mentioned above. Now I am trying to generate the output files (lm.binary, warpa.words, trie), while generating I am getting an error that the arguments are required āāvocabulary.txt, --output_dir, --top_k, --kenlm_bins, --arpa_order, --max_arpa_memory, --arpa_prune, --binary_a_bits, --binary_q_bits, --binary_typeā.
What is generate_trie for? We will be getting an output file trie right, isnāt that enough to train the model on vocabulary.txt