Creating DeepSpeech Model for Hindi

i did not understand

There is likely a bug somewhere that makes your data getting funny. Have you written your any code for those data ?

No. i am just using it in deepspeech’s version 0.5.1 code

Ok. First, it’d be better you work on master. Apply https://gist.github.com/reuben/b68b9085f7b293580f8431156a33daa9 if you need to reload a 0.5.1 english checkpoint.

no luck with this. i tried.

i think fault was in my binary which i created with wrong alphabets. Now i am trying again.

after cloning deep speech doing
git checkout v0.5.1
but the version is 0.6.0 alpha 9

Test on data/test/test.csv - WER: 1.000000, CER: 0.911950, loss: 113.759827
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.666667, loss: 35.086880
 - src: "नाम"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.833333, loss: 39.348869
 - src: "सहायता"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.909091, loss: 67.961250
 - src: "सहायता2करिए"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.875000, loss: 79.533066
 - src: "आपका2नाम2क्या2है"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.866667, loss: 83.190369
 - src: "तमहरआ2कयआ2नाम2ह"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.900000, loss: 104.061623
 - src: "क्या2बोलना2चाहते2हैं"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.944444, loss: 104.943001
 - src: "हिंदी2में2बात2करिए"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.931034, loss: 133.641830
 - src: "मै2आपकी2कया2सहायता2कर2सकता2हू"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.937500, loss: 159.305466
 - src: "सर2मै2विमवीजयोर2से2बात2कर2रहा2हू"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 330.525848
 - src: "नई2दिल्ली"
 - res: "का"
--------------------------------------------------------------------------------

SRC is correct now, but why in between words instead space numerical ‘2’ is coming? any idea?

It does not looks like ASCII 2, more like some other UTF-8 variant. Maybe something with your alphabet? It’s really important to ensure you use the same alphabet everywhere.

i am using same alphabets every where, but at the time of training it says your have some missing alphabets in alphabet.txt that are present in train test or dev fiiles, but my alphabets are already present in that alphabet.txt. when i am deleting that alphabet and again entering the same, it goes and works properly. but dont know what is problem with number 2 intead of spaces. i created alphabets in utf-8 using notepad.

It’s possible windows line endings are playing a role here

If it says it cannot find the character, you need to fix that in your alphabet file if it’s a legit character, or cleanup your dataset if it is not

I’m not sure I get your process here.

It keeps saying the word
(' ')
is not present in your alphabet.
do i have to add spaces after each character in alphabet.txt?

No, but you need it at least once in your dataset. Make sure this is the proper UTF-8 code.

All done,

Hi, What max length of audio would be best for training data?

or

what should be length of audio/words for training to get best result in model.

can we place like 5-10 minutes conversation of each audio for training?

This is mostly going to be limited by your batch size and your GPU memory. To give you a ballpark, 11GB RAM on a GPU, I cannot go above 68 batch size with clips up to 10-15 seconds. If I push more, then I run out of GPU memory.

Okay Thank You for the support. :slight_smile: great community with great people

@cryptoaimdy I am working on Hindi ASR for my thesis. Could you please help with the process or steps to build Hindi ASR using Deepspeech.

Hi, the process is the same as english, except for the alphabet file and training data. you need to create an alphabet.txt file in hindi with all possible alphabets of hindi. Also you need a hindi vocabulary file and a few audios with transcripts to train and test.

you can remind me on this mail id to send you the hindi Vocab file i have.
mohammadali1ali@gmail.com

@cryptoaimdy Thank you for your response, I have mailed you on your given email id.

Hi @cryptoaimdy @lissyx ,

I am working on a similar scenario i.e. using deepspeech with a Hindi dataset.

The parameters I am using right now => LR= .00003, DR=0.2, alpha=0.75, beta=1.85, n_hidden = 2048, train test and dev batch size = 16,16,16.

With this parameters, the last training completed in 10 hours with 17 Epochs, and results were as follows:

Training loss: 223.638213,
validation loss: 236.106942,
Testing loss: 254.768326,
WER: 0.790118,
CER: 0.593037.

Earlier I was getting a loss in the range 300-400 with other values for the parameters so I have been changing the values and training again and again to get the best result. The inference that I pasted above, that can’t be a good result, can you suggest some values that I should change to reach to optimal results? Any help is appreciated.

Thanks!

Hi, can someone please help me out.
@Sreyan_Ghosh @lissyx

Mozilla isn’t really maintaining this any longer. Check this post.

1 Like