Gramvaani importer file is not working

lissyx · April 16, 2019, 10:37am

Also, please refrain from that, there’s nothing we can do if you don’t give more informations. This is painful since we have to ask you details you should already have given, thus forcing an extra round-trip.

kdavis · April 16, 2019, 10:54am

Do you have the Gramvaani data set?

sanjay.pandey · April 16, 2019, 10:57am

@kdavis no i thought like other importer it must be having link for the download as well in code but just saw there is no link in the code and it is just an importer.
Can you please tell me from where to get from gramvaani i have visited their site but not able to find any voice dataset.

kdavis · April 16, 2019, 11:01am

@sanjay.pandey No this data set is, for now, private between Mozilla and Gram Vaani, but will eventually be released to everyone when all its details are worked out.

sanjay.pandey · April 16, 2019, 11:21am

oh! sad! Can you tell me estimate time(days or months) when it will be released?

Actually i am training for “restaurant order taking” so i need to my model to understand every single item written in menu. I trained on common voice dataset(english validation-train) and i thought by adding menu items on language model will be enough but actually it is taking words on vocab it is trained on. So i need to train on vocabulary which i am including on language model. Right?

kdavis · April 16, 2019, 11:46am

Actually the Gram Vaani data is in Hindi. So I don’t think it will be of much use for an non-Hindi engine, and the release, I’d guess, will be sometime this year. But I’m not sure when.

Also, you don’t need to train on audio that contains all the vocabulary in the language model, assuming the language you are targeting has a more-or-less phonemic orthography, i.e. it sounds more-or-less like it is written.

PS: I’d like to hear more about your use case.

sanjay.pandey · April 16, 2019, 12:16pm

Okay @kdavis actually being in India myself and i wanted to train for hindi transcription as well for different use case.

Right now using deepspeech 0.4.1 i have further trained it on common voice dataset using import_cv.py file and have been able to reach successfully to around 0.8 loss on training dataset. Now i want to use it for taking mobile number of different customer and also taking orders from them like they will prompt their phone number
8990993231 which will be of 10 digits and then say the food items they want
like pizza,pasta,any indian items as well as any global items.
So what i did was i created language model like this which had
zero
one
two
three
four
…
nine
paneer butter masala
margherita pizza
dessert
tandoori roti
lasagne
russian salad
paneer masala tikka
paneer handi
burger
and etc…

the problem which i am facing is that sometime the word get mixed up like if i say “paneer handi” it will listen “handi handi” and other sorts of problem. Can you please tell me what i can do further?

Also when i made language model using same text of common voice which i used for training and then did inference it was quite better as i tested with different people live.

Right now i am using H1n(ZOOM) stereo mic recorder and the voice is in 48khz.
I am using audio transriber gui for live inference. Does it convert the audio into 16khz or i need to convert it in 16khz?

kdavis · April 17, 2019, 7:45am

Two things come to mind that might help

Record at 16KHz, 16bit, mono as transcoding can introduce artifacts.
Try adjusting the BEAM_WIDTH, LM_ALPHA, and LM_BETA parameters (For Python found here) when you use you custom “order taking language model”. These can be used to make the model more likely to give results from the text you trained your language model on.

sanjay.pandey · April 17, 2019, 1:18pm

Can you tell me if i need to reduce the value of each or decrease it or it is a random call? Like is there any methodology for tweaking this three parameters?

kdavis · April 17, 2019, 2:38pm

I’d start by increasing LM_ALPHA, then experiment from there.

sanjay.pandey · April 18, 2019, 6:30am

Okay will try the same and also can you suggest me any microphone which have such specification or the mic which you people have used for directly recording at 16khz? i am having hard time to find frequency response of 16khz and 16 bit.

kdavis · April 18, 2019, 7:02am

Truthfully, I’m don’t know a good mic to select. Maybe someone else can chime in?

sanjay.pandey · April 23, 2019, 11:48am

Okay thank you for your support @kdavis
I just wanted to know that language model will not be effective if there is single word in every line instead of sentence right?

kdavis · April 23, 2019, 11:56am

Correct. An entire sentence needs to be on a line so the language model learns about dependencies between words in a sentence and not about words in isolation.

sanjay.pandey · April 23, 2019, 12:26pm

so if i want to recognize specific words just including in the language model wont work right? I need to train deepspeech model on that word so acoustic model work fine ?

kdavis · April 23, 2019, 12:37pm

Oh yeah. Sorry lost the larger context from earlier messages.

If you want to only recognize specific words, you can create a language model with only a single word per line. Such a language model will not be of use for recognizing longer sentences or phrases, however.

To only recognize specific words you don’t need to re-train the acoustic model with those words. Most of the time creating a new language model should be enough.

sanjay.pandey · April 24, 2019, 7:32am

What if i include mix of one word per line and also more than one word per line in the same language model.
Or for one word per line and for more than one word per line i need to make two seperate language model?

kdavis · April 24, 2019, 7:42am

You can have this mix, but you have to be careful about the ratios of one word per line vs one sentence per line, getting the right balance is an art.

sanjay.pandey · April 24, 2019, 12:35pm

okay also tell me after making custome language model i can change beam width lm alpha and lm beta anytime or i need to generate binary and trie file everytime i change those 3 hyperparameters? Asking cause i am not seeing any change in inference after changing value of hyperparameters.
i have included words like “three cheers chocolate” iin language model still it is giving result as “three cold” it is taking cold from “cold coffee”.

2)What are the values limit of lm apha,lm beta and beam width?
As i have around 200 words so should i decrease beam width which is currently 500?

reuben · April 26, 2019, 1:34pm

You can change the hyperparameters any time, they’re not tied to the files. There’s no explicit limits to the values. Beam width occurs at the timestep level, not word level, it’s unrelated to the size of your vocabulary, but you should definitely experiment with different values to see what works best for your use case.