Gramvaani importer file is not working

sanjay.pandey · April 16, 2019, 9:28am

When i use the same approach for file importing which i used during common voice importer file(import_cv.py). It doesnt work. @lissyx can you please help me with this?

lissyx · April 16, 2019, 10:37am

No, I can’t, @kdavis worked on that, as git history shows: https://github.com/mozilla/DeepSpeech/commits/master/bin/import_gram_vaani.py

lissyx · April 16, 2019, 10:37am

Also, please refrain from that, there’s nothing we can do if you don’t give more informations. This is painful since we have to ask you details you should already have given, thus forcing an extra round-trip.

kdavis · April 16, 2019, 10:54am

Do you have the Gramvaani data set?

sanjay.pandey · April 16, 2019, 10:57am

@kdavis no i thought like other importer it must be having link for the download as well in code but just saw there is no link in the code and it is just an importer.
Can you please tell me from where to get from gramvaani i have visited their site but not able to find any voice dataset.

kdavis · April 16, 2019, 11:01am

@sanjay.pandey No this data set is, for now, private between Mozilla and Gram Vaani, but will eventually be released to everyone when all its details are worked out.

sanjay.pandey · April 16, 2019, 11:21am

oh! sad! Can you tell me estimate time(days or months) when it will be released?

Actually i am training for “restaurant order taking” so i need to my model to understand every single item written in menu. I trained on common voice dataset(english validation-train) and i thought by adding menu items on language model will be enough but actually it is taking words on vocab it is trained on. So i need to train on vocabulary which i am including on language model. Right?

kdavis · April 16, 2019, 11:46am

Actually the Gram Vaani data is in Hindi. So I don’t think it will be of much use for an non-Hindi engine, and the release, I’d guess, will be sometime this year. But I’m not sure when.

Also, you don’t need to train on audio that contains all the vocabulary in the language model, assuming the language you are targeting has a more-or-less phonemic orthography, i.e. it sounds more-or-less like it is written.

PS: I’d like to hear more about your use case.

sanjay.pandey · April 16, 2019, 12:16pm

Okay @kdavis actually being in India myself and i wanted to train for hindi transcription as well for different use case.

Right now using deepspeech 0.4.1 i have further trained it on common voice dataset using import_cv.py file and have been able to reach successfully to around 0.8 loss on training dataset. Now i want to use it for taking mobile number of different customer and also taking orders from them like they will prompt their phone number
8990993231 which will be of 10 digits and then say the food items they want
like pizza,pasta,any indian items as well as any global items.
So what i did was i created language model like this which had
zero
one
two
three
four
…
nine
paneer butter masala
margherita pizza
dessert
tandoori roti
lasagne
russian salad
paneer masala tikka
paneer handi
burger
and etc…

the problem which i am facing is that sometime the word get mixed up like if i say “paneer handi” it will listen “handi handi” and other sorts of problem. Can you please tell me what i can do further?

Also when i made language model using same text of common voice which i used for training and then did inference it was quite better as i tested with different people live.

Right now i am using H1n(ZOOM) stereo mic recorder and the voice is in 48khz.
I am using audio transriber gui for live inference. Does it convert the audio into 16khz or i need to convert it in 16khz?

kdavis · April 17, 2019, 7:45am

Two things come to mind that might help

Record at 16KHz, 16bit, mono as transcoding can introduce artifacts.
Try adjusting the BEAM_WIDTH, LM_ALPHA, and LM_BETA parameters (For Python found here) when you use you custom “order taking language model”. These can be used to make the model more likely to give results from the text you trained your language model on.

sanjay.pandey · April 17, 2019, 1:18pm

Can you tell me if i need to reduce the value of each or decrease it or it is a random call? Like is there any methodology for tweaking this three parameters?

kdavis · April 17, 2019, 2:38pm

I’d start by increasing LM_ALPHA, then experiment from there.

sanjay.pandey · April 18, 2019, 6:30am

Okay will try the same and also can you suggest me any microphone which have such specification or the mic which you people have used for directly recording at 16khz? i am having hard time to find frequency response of 16khz and 16 bit.

kdavis · April 18, 2019, 7:02am

Truthfully, I’m don’t know a good mic to select. Maybe someone else can chime in?

sanjay.pandey · April 23, 2019, 11:48am

Okay thank you for your support @kdavis
I just wanted to know that language model will not be effective if there is single word in every line instead of sentence right?

kdavis · April 23, 2019, 11:56am

Correct. An entire sentence needs to be on a line so the language model learns about dependencies between words in a sentence and not about words in isolation.

sanjay.pandey · April 23, 2019, 12:26pm

so if i want to recognize specific words just including in the language model wont work right? I need to train deepspeech model on that word so acoustic model work fine ?

kdavis · April 23, 2019, 12:37pm

Oh yeah. Sorry lost the larger context from earlier messages.

If you want to only recognize specific words, you can create a language model with only a single word per line. Such a language model will not be of use for recognizing longer sentences or phrases, however.

To only recognize specific words you don’t need to re-train the acoustic model with those words. Most of the time creating a new language model should be enough.

sanjay.pandey · April 24, 2019, 7:32am

What if i include mix of one word per line and also more than one word per line in the same language model.
Or for one word per line and for more than one word per line i need to make two seperate language model?

kdavis · April 24, 2019, 7:42am

You can have this mix, but you have to be careful about the ratios of one word per line vs one sentence per line, getting the right balance is an art.

Topic		Replies	Views
Where is Vocab.txt file? DeepSpeech	10	2467	April 5, 2019
Using Deep Speech DeepSpeech	34	12846	August 20, 2019
Training Vietnamese model DeepSpeech	33	3566	May 21, 2019
Language Model during training effect DeepSpeech	6	1341	August 15, 2019
DeepSpeech model training DeepSpeech	65	7987	November 12, 2019

Gramvaani importer file is not working

Related topics