Question regarding custom language model

Hi Guys,

deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav

Can i use default output_graph.pbmm with my custom language model trained with few words?

Yes, you will just have to pass them as argument instead of the released language model and trie of course.

Hi,

Thank you very much!

  1. Can i use Deep Speech for Dialog management system? To be specific, dialog management system for conference. I need to train the models to recognise author names. How do i go about it? Is adding the names to language model sufficient or i need to have wav files of the name pronunciation?

  2. My second use case is giving commands to robot. I want to train my language model with small set of data, so it recognises accurately. Is there any documentation on trie creation? ‘python util/taskcluster.py --target .’ downloads and extracts native_client.tar.xz, but i do not find generate_trie to run ’ ./generate_trie alphabet.txt lm.binary vocab.txt trie’ . Am i missing something?

  3. To tackle the problem mentioned above, i used native_client.tar.xz of v0.1.1, and was able to create trie file for my vocabulary. My vocabulary had just 20 words. Eg: javascript, move right etc., I used the default ‘output_graph.pbmm’ with language and trie which i created. Didn’t get the expected result. Is it because i used native_client.tar.xz of older version? If so, how do i create trie from latest version of native_client?

The model is not able to provide anything that would allow to differentiate any speaker

Yes, it’s all extensively documented, read data/lm/README.md and also read the tutorial made by @elpimous_robot on the forum. Just pay attention, trie creation evolved a bit, so you might have to adapt / update somme command line. Content from data/lm/README.md is accurate and uptodate.

1 Like

Which problem are you referring to ? What are the results you were expecting ?

Please report issues about v0.3.0, because v0.1.1 is now way different and too old.

I do not want it to recognise the speaker.

Lets say the conversation goes like this:
Eg: Could you fetch me papers by ‘Narayan’

In my case, i want it to recognise a lot of name, say 500!! Is it sufficient if i train language model with 500 names?

My test audio file had ‘javascript move right’

Result: javas cript more right

Thank you very much!

I can’t guarantee anything, but it’s obvious that this should help. Maybe you might need to tweak some of the hyper-parameters being used client-side, as well, like lm weights and stuff.

Thank you.

I read few queries on DeepSpeech not processing longer files. Reason for it mentioned was ‘audio files that are sentence length, 4-5 seconds audio’ . Is it still the case with latest version?

On the page i read i can extend the pre-trained model using checkpoint. I would like to use 60 mins long podcast to train the model. Is it possible? should i cut the audio into sentence chunk? if so, what is the recommended length?