Question regarding custom language model

anusha2701 · October 27, 2018, 7:30pm

Hi Guys,

deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav

Can i use default output_graph.pbmm with my custom language model trained with few words?

lissyx · October 29, 2018, 7:58am

Yes, you will just have to pass them as argument instead of the released language model and trie of course.

anusha2701 · October 29, 2018, 12:39pm

Hi,

Thank you very much!

Can i use Deep Speech for Dialog management system? To be specific, dialog management system for conference. I need to train the models to recognise author names. How do i go about it? Is adding the names to language model sufficient or i need to have wav files of the name pronunciation?
My second use case is giving commands to robot. I want to train my language model with small set of data, so it recognises accurately. Is there any documentation on trie creation? ‘python util/taskcluster.py --target .’ downloads and extracts native_client.tar.xz, but i do not find generate_trie to run ’ ./generate_trie alphabet.txt lm.binary vocab.txt trie’ . Am i missing something?
To tackle the problem mentioned above, i used native_client.tar.xz of v0.1.1, and was able to create trie file for my vocabulary. My vocabulary had just 20 words. Eg: javascript, move right etc., I used the default ‘output_graph.pbmm’ with language and trie which i created. Didn’t get the expected result. Is it because i used native_client.tar.xz of older version? If so, how do i create trie from latest version of native_client?

lissyx · October 29, 2018, 12:44pm

The model is not able to provide anything that would allow to differentiate any speaker

lissyx · October 29, 2018, 12:45pm

Yes, it’s all extensively documented, read data/lm/README.md and also read the tutorial made by @elpimous_robot on the forum. Just pay attention, trie creation evolved a bit, so you might have to adapt / update somme command line. Content from data/lm/README.md is accurate and uptodate.

lissyx · October 29, 2018, 12:46pm

Which problem are you referring to ? What are the results you were expecting ?

lissyx · October 29, 2018, 12:47pm

Please report issues about v0.3.0, because v0.1.1 is now way different and too old.

anusha2701 · October 29, 2018, 1:07pm

I do not want it to recognise the speaker.

Lets say the conversation goes like this:
Eg: Could you fetch me papers by ‘Narayan’

In my case, i want it to recognise a lot of name, say 500!! Is it sufficient if i train language model with 500 names?

anusha2701 · October 29, 2018, 1:08pm

My test audio file had ‘javascript move right’

Result: javas cript more right

anusha2701 · October 29, 2018, 1:11pm

Thank you very much!

lissyx · October 29, 2018, 1:19pm

I can’t guarantee anything, but it’s obvious that this should help. Maybe you might need to tweak some of the hyper-parameters being used client-side, as well, like lm weights and stuff.

anusha2701 · November 2, 2018, 11:55am

Thank you.

I read few queries on DeepSpeech not processing longer files. Reason for it mentioned was ‘audio files that are sentence length, 4-5 seconds audio’ . Is it still the case with latest version?

On the page i read i can extend the pre-trained model using checkpoint. I would like to use 60 mins long podcast to train the model. Is it possible? should i cut the audio into sentence chunk? if so, what is the recommended length?