Also, please refrain from that, there’s nothing we can do if you don’t give more informations. This is painful since we have to ask you details you should already have given, thus forcing an extra round-trip.
Do you have the Gramvaani data set?
@kdavis no i thought like other importer it must be having link for the download as well in code but just saw there is no link in the code and it is just an importer.
Can you please tell me from where to get from gramvaani i have visited their site but not able to find any voice dataset.
@sanjay.pandey No this data set is, for now, private between Mozilla and Gram Vaani, but will eventually be released to everyone when all its details are worked out.
oh! sad! Can you tell me estimate time(days or months) when it will be released?
Actually i am training for “restaurant order taking” so i need to my model to understand every single item written in menu. I trained on common voice dataset(english validation-train) and i thought by adding menu items on language model will be enough but actually it is taking words on vocab it is trained on. So i need to train on vocabulary which i am including on language model. Right?
Actually the Gram Vaani data is in Hindi. So I don’t think it will be of much use for an non-Hindi engine, and the release, I’d guess, will be sometime this year. But I’m not sure when.
Also, you don’t need to train on audio that contains all the vocabulary in the language model, assuming the language you are targeting has a more-or-less phonemic orthography, i.e. it sounds more-or-less like it is written.
PS: I’d like to hear more about your use case.
Okay @kdavis actually being in India myself and i wanted to train for hindi transcription as well for different use case.
Right now using deepspeech 0.4.1 i have further trained it on common voice dataset using import_cv.py file and have been able to reach successfully to around 0.8 loss on training dataset. Now i want to use it for taking mobile number of different customer and also taking orders from them like they will prompt their phone number
8990993231 which will be of 10 digits and then say the food items they want
like pizza,pasta,any indian items as well as any global items.
So what i did was i created language model like this which had
paneer butter masala
paneer masala tikka
the problem which i am facing is that sometime the word get mixed up like if i say “paneer handi” it will listen “handi handi” and other sorts of problem. Can you please tell me what i can do further?
Also when i made language model using same text of common voice which i used for training and then did inference it was quite better as i tested with different people live.
Right now i am using H1n(ZOOM) stereo mic recorder and the voice is in 48khz.
I am using audio transriber gui for live inference. Does it convert the audio into 16khz or i need to convert it in 16khz?
Two things come to mind that might help
- Record at 16KHz, 16bit, mono as transcoding can introduce artifacts.
- Try adjusting the BEAM_WIDTH, LM_ALPHA, and LM_BETA parameters (For Python found here) when you use you custom “order taking language model”. These can be used to make the model more likely to give results from the text you trained your language model on.
Can you tell me if i need to reduce the value of each or decrease it or it is a random call? Like is there any methodology for tweaking this three parameters?
I’d start by increasing LM_ALPHA, then experiment from there.
Okay will try the same and also can you suggest me any microphone which have such specification or the mic which you people have used for directly recording at 16khz? i am having hard time to find frequency response of 16khz and 16 bit.
Truthfully, I’m don’t know a good mic to select. Maybe someone else can chime in?
Okay thank you for your support @kdavis
I just wanted to know that language model will not be effective if there is single word in every line instead of sentence right?
Correct. An entire sentence needs to be on a line so the language model learns about dependencies between words in a sentence and not about words in isolation.
so if i want to recognize specific words just including in the language model wont work right? I need to train deepspeech model on that word so acoustic model work fine ?
Oh yeah. Sorry lost the larger context from earlier messages.
If you want to only recognize specific words, you can create a language model with only a single word per line. Such a language model will not be of use for recognizing longer sentences or phrases, however.
To only recognize specific words you don’t need to re-train the acoustic model with those words. Most of the time creating a new language model should be enough.
What if i include mix of one word per line and also more than one word per line in the same language model.
Or for one word per line and for more than one word per line i need to make two seperate language model?
You can have this mix, but you have to be careful about the ratios of one word per line vs one sentence per line, getting the right balance is an art.
okay also tell me after making custome language model i can change beam width lm alpha and lm beta anytime or i need to generate binary and trie file everytime i change those 3 hyperparameters? Asking cause i am not seeing any change in inference after changing value of hyperparameters.
i have included words like “three cheers chocolate” iin language model still it is giving result as “three cold” it is taking cold from “cold coffee”.
2)What are the values limit of lm apha,lm beta and beam width?
As i have around 200 words so should i decrease beam width which is currently 500?
You can change the hyperparameters any time, they’re not tied to the files. There’s no explicit limits to the values. Beam width occurs at the timestep level, not word level, it’s unrelated to the size of your vocabulary, but you should definitely experiment with different values to see what works best for your use case.