Creating DeepSpeech Model for Hindi

like this i am getting, it is because of low data? i think src should be displayed as it is.

I don’t understand, it looks like you have src == res, which would mean computed transcription matches expected transcription.

WER: 1.000000, CER: 0.600000, loss: 33.510384
 - wav: file:///home/yk/hindi-deep/DeepSpeech/data/test/009.wav
 - src: "ैयीखखाखदुखखौखप हखपरा"
 - res: "पखाख खपरा"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 6.128268
 - wav: file:///home/yk/hindi-deep/DeepSpeech/data/test/004.wav
 - src: "ुखत"
 - res: "ुखत"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 6.713559
 - wav: file:///home/yk/hindi-deep/DeepSpeech/data/test/005.wav
 - src: "ापखी ख"
 - res: "ापखी ख"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 7.111163
 - wav: file:///home/yk/hindi-deep/DeepSpeech/data/test/010.wav
 - src: "ापखी खखैुेी"
 - res: "ापखी खखैुेी"
-------------------------------------------------------------------------

at first i got the WER 1

Well, this is the test set showing worst examples. I don’t see anything strange, please elaborate.

wav_filename,wav_filesize,transcript
001.wav,101000,तमहरआ कयआ नाम ह
002.wav,138000,मै आपकी कया सहायता कर सकता हू
003.wav,125000,सर मै विमवीशयोर से बात कर रहा हू
004.wav,78000,नाम
005.wav,99400,सहायता
006.wav,106000,आपका नाम क्या है
007.wav,80900,नई दिल्ली
008.wav,90700,हिंदी में बात करिए
009.wav,81900,क्या बोलना चाहते हैं
010.wav,88700,सहायता करिए

this is my original test.csv content match it with the src its totally different.

this is the original transcript

this is what i get in src

I would suspect your importer code.

i did not understand

There is likely a bug somewhere that makes your data getting funny. Have you written your any code for those data ?

No. i am just using it in deepspeech’s version 0.5.1 code

Ok. First, it’d be better you work on master. Apply https://gist.github.com/reuben/b68b9085f7b293580f8431156a33daa9 if you need to reload a 0.5.1 english checkpoint.

no luck with this. i tried.

i think fault was in my binary which i created with wrong alphabets. Now i am trying again.

after cloning deep speech doing
git checkout v0.5.1
but the version is 0.6.0 alpha 9

Test on data/test/test.csv - WER: 1.000000, CER: 0.911950, loss: 113.759827
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.666667, loss: 35.086880
 - src: "नाम"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.833333, loss: 39.348869
 - src: "सहायता"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.909091, loss: 67.961250
 - src: "सहायता2करिए"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.875000, loss: 79.533066
 - src: "आपका2नाम2क्या2है"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.866667, loss: 83.190369
 - src: "तमहरआ2कयआ2नाम2ह"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.900000, loss: 104.061623
 - src: "क्या2बोलना2चाहते2हैं"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.944444, loss: 104.943001
 - src: "हिंदी2में2बात2करिए"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.931034, loss: 133.641830
 - src: "मै2आपकी2कया2सहायता2कर2सकता2हू"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.937500, loss: 159.305466
 - src: "सर2मै2विमवीजयोर2से2बात2कर2रहा2हू"
 - res: "का"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 330.525848
 - src: "नई2दिल्ली"
 - res: "का"
--------------------------------------------------------------------------------

SRC is correct now, but why in between words instead space numerical ‘2’ is coming? any idea?

It does not looks like ASCII 2, more like some other UTF-8 variant. Maybe something with your alphabet? It’s really important to ensure you use the same alphabet everywhere.

i am using same alphabets every where, but at the time of training it says your have some missing alphabets in alphabet.txt that are present in train test or dev fiiles, but my alphabets are already present in that alphabet.txt. when i am deleting that alphabet and again entering the same, it goes and works properly. but dont know what is problem with number 2 intead of spaces. i created alphabets in utf-8 using notepad.

It’s possible windows line endings are playing a role here

If it says it cannot find the character, you need to fix that in your alphabet file if it’s a legit character, or cleanup your dataset if it is not

I’m not sure I get your process here.

It keeps saying the word
(' ')
is not present in your alphabet.
do i have to add spaces after each character in alphabet.txt?

No, but you need it at least once in your dataset. Make sure this is the proper UTF-8 code.

All done,

Hi, What max length of audio would be best for training data?

or

what should be length of audio/words for training to get best result in model.

can we place like 5-10 minutes conversation of each audio for training?

This is mostly going to be limited by your batch size and your GPU memory. To give you a ballpark, 11GB RAM on a GPU, I cannot go above 68 batch size with clips up to 10-15 seconds. If I push more, then I run out of GPU memory.