When i use the same approach for file importing which i used during common voice importer file(import_cv.py). It doesnt work. @lissyx can you please help me with this?
No, I can’t, @kdavis worked on that, as
git history shows: https://github.com/mozilla/DeepSpeech/commits/master/bin/import_gram_vaani.py
Also, please refrain from that, there’s nothing we can do if you don’t give more informations. This is painful since we have to ask you details you should already have given, thus forcing an extra round-trip.
Do you have the Gramvaani data set?
@kdavis no i thought like other importer it must be having link for the download as well in code but just saw there is no link in the code and it is just an importer.
Can you please tell me from where to get from gramvaani i have visited their site but not able to find any voice dataset.
@sanjay.pandey No this data set is, for now, private between Mozilla and Gram Vaani, but will eventually be released to everyone when all its details are worked out.
oh! sad! Can you tell me estimate time(days or months) when it will be released?
Actually i am training for “restaurant order taking” so i need to my model to understand every single item written in menu. I trained on common voice dataset(english validation-train) and i thought by adding menu items on language model will be enough but actually it is taking words on vocab it is trained on. So i need to train on vocabulary which i am including on language model. Right?
Actually the Gram Vaani data is in Hindi. So I don’t think it will be of much use for an non-Hindi engine, and the release, I’d guess, will be sometime this year. But I’m not sure when.
Also, you don’t need to train on audio that contains all the vocabulary in the language model, assuming the language you are targeting has a more-or-less phonemic orthography, i.e. it sounds more-or-less like it is written.
PS: I’d like to hear more about your use case.
Okay @kdavis actually being in India myself and i wanted to train for hindi transcription as well for different use case.
Right now using deepspeech 0.4.1 i have further trained it on common voice dataset using import_cv.py file and have been able to reach successfully to around 0.8 loss on training dataset. Now i want to use it for taking mobile number of different customer and also taking orders from them like they will prompt their phone number
8990993231 which will be of 10 digits and then say the food items they want
like pizza,pasta,any indian items as well as any global items.
So what i did was i created language model like this which had
paneer butter masala
paneer masala tikka
the problem which i am facing is that sometime the word get mixed up like if i say “paneer handi” it will listen “handi handi” and other sorts of problem. Can you please tell me what i can do further?
Also when i made language model using same text of common voice which i used for training and then did inference it was quite better as i tested with different people live.
Right now i am using H1n(ZOOM) stereo mic recorder and the voice is in 48khz.
I am using audio transriber gui for live inference. Does it convert the audio into 16khz or i need to convert it in 16khz?
Two things come to mind that might help
- Record at 16KHz, 16bit, mono as transcoding can introduce artifacts.
- Try adjusting the BEAM_WIDTH, LM_ALPHA, and LM_BETA parameters (For Python found here) when you use you custom “order taking language model”. These can be used to make the model more likely to give results from the text you trained your language model on.
Can you tell me if i need to reduce the value of each or decrease it or it is a random call? Like is there any methodology for tweaking this three parameters?
I’d start by increasing LM_ALPHA, then experiment from there.
Okay will try the same and also can you suggest me any microphone which have such specification or the mic which you people have used for directly recording at 16khz? i am having hard time to find frequency response of 16khz and 16 bit.
Truthfully, I’m don’t know a good mic to select. Maybe someone else can chime in?