Query regarding speed of training and issues with convergence

I don’t know how your language works, what is the zero with space is used for?

It does’nt need it. It was a remnant in the cleaning processes. do you think that might be the issue? If so, i’ll clean up the data agin and try.

is the alphabet format wonky though? or is it ok.

Yes, remove it. Please try posting the alphabet again using the forum format, </>

It will be easier to read

Ok. On it.

I’ve updated the previous comment to have it as preformatted text.

Without the zero space yes.

New alphabet.txt :

उ
प
ा
ळ
न
म
ो
इ
ग
अ
ॲ
ऍ
ह
ऴ
ऊ
त
थ
ज
े
य
भ
ओ
द
ॅ
स
व
ू
ञ
ऋ
।
ख
ि
ब
ध
ी
ु
फ
ऩ
ई
 
ट
ै
ऱ
ः
॰
ऌ
ौ
ॉ
॥
क
ॠ
झ
्
ठ
श
ँ
ल
ऐ
ॻ
़
ऑ
ऽ
ड
औ
ङ
ण
ढ
ए
ष
च
छ
ं
ृ
आ
घ
र

WER: 1.000000, CER: 0.500000, loss: 0.732161

  • src: “याला आपले अबोध हेतू कारण असतात”
  • res: “याउापेअउोहेउतूउाअसउतात”

WER: 1.000000, CER: 0.631579, loss: 0.948461

  • src: “अनोमा बौद्ध साहित्यात निर्देशिलेली भारतातील एक पवित्र नदी”
  • res: “अनउादउसाउतयातऊनउेेउातातउपवतन”

output is still the same. any insights on if the alphabets should be ordered or not?

For the first time no, if you train a model then change the alphabet order then yes. Did you test it without the LM and reatrained the LM?

Looks like उ is being used as space maybe? change the space with उ I mean position

1 Like

I figured it out. " "has to be the first character in the alphabet.txt

src: “हा प्राणी सरीसृपांच्या सरपटणाऱ्या प्राण्यांच्या र्हिंकोसीफॅलिया गणातला आहे”

  • res: “हा पान सू स पान या स पना या प ा या या हे तो स तॅ या गा त ा हे”

output is bad but it’s not trained enough. I got the space.

you are right. exactly why i tried putiing space first.

testing without the LM for now.

i’ll experiment and try to understand how better to standardize the alphabet formatting.

<This was a discussion we had to have in private because i had to wait 3 hrs because i am a new user. putting this here so people have some context.>

After this converstion and after a lot of fine tuning, i have some results i am satisfied with. will keep posting if i find anything interesting. also, after i test the model out, if everything is as it should be, i’ll post my protocol too. Thanks for helping me. @lissyx @carlfm01!!

results

--------------------------------------------------------------------------------
WER: 0.222222, CER: 0.033898, loss: 0.000426
 - src: "मध्यजीवमहाकल्पच्या अखेरपासून हे कुल लुप्त झाले असा समज होता"
 - res: "मध्यजीव महाकल्पाच्या अखेरपासून हे कुल लुप्त झाले असा समज होता"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000007
 - src: "याला आपले अबोध हेतू कारण असतात"
 - res: "याला आपले अबोध हेतू कारण असतात"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000015
 - src: "अवधानाची ही चंचलता जीवनोपयोगी असते"
 - res: "अवधानाची ही चंचलता जीवनोपयोगी असते"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000016
 - src: "यांना समुद्री अवशिष्ट म्हणणे योग्य होईल"
 - res: "यांना समुद्री अवशिष्ट म्हणणे योग्य होईल"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000021
 - src: "तसेच काही ठिकाणी थंड पाण्याची खोल सरोवरेही होती"
 - res: "तसेच काही ठिकाणी थंड पाण्याची खोल सरोवरेही होती"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000021
 - src: "व रेखावृत्त ते पू"
 - res: "व रेखावृत्त ते पू"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000037
 - src: "किमी लोकसंख्या आकारमानाने पोर्तुगालच्या सु"
 - res: "किमी लोकसंख्या आकारमानाने पोर्तुगालच्या सु"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000072
 - src: "यायोगे प्राण्याला परिसरातील सगळ्या गोष्टींशी संपर्क ठेवता येतो"
 - res: "यायोगे प्राण्याला परिसरातील सगळ्या गोष्टींशी संपर्क ठेवता येतो"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000085
 - src: "असा बौद्ध साहित्यात उल्लेख आहे"
 - res: "असा बौद्ध साहित्यात उल्लेख आहे"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000089
 - src: "विमान चालविणाऱ्या वैमानिकाला अनेक गोष्टींकडे सतत अवधान द्यावे लागते"
 - res: "विमान चालविणाऱ्या वैमानिकाला अनेक गोष्टींकडे सतत अवधान द्यावे लागते"
--------------------------------------------------------------------------------

I had some wonky issues with alphabet.txt, will also clarify after i get some concrete understanding of what was wrong.

Can you share some of that ? It might be a valuable information for others !

@lissyx I’ll test the model thoroughly(tomorrow) and if everything goes as expected(read- results are not a fluke(unlikely)), then i’ll post everthing in another clean post. Also, i have had some issues with the alphabet handling; i still have to reason why my initial models failed.

As soon as i am done with this, i’ll post the whole config and my complete protocol. I do not want to be spreading misinformation or miss out on something that was important. Post will be up in a few days. Meanwhile, if anyone has queries about anything i did in this post, i’ll respond to those here.

1 Like

Thanks, that’s perfectly understandable !

Sorry for the delay. I’ve been trying to understand why I can’t train a decent model when space is not the first character in the alphabet.txt. I’ve checked everything from text.py, to DS-ctcdecoder’s source and I haven’t found why. The only part left is the _swigwrapper.cpython-36m-x86_64-linux-gnu.so(so file in python library after pip install). Everything else is working as expected except for this. Any inputs?

Can you be more specific ? Do you have anything supporting your theory ?

@lissyx Ok, let me explain step by step.

The first few models I trained had issues where the spaces were missing(the results are discussed in this very thread.). @carlfm01 and I noticed that the first letter(also mentioned in this thread) might have been used for space, so I replaced space as the first character and retrained the model and then the model ran as expected.

We discussed the possibility where I might have trained an initial model with one alphabet.txt and then retrained it with a modified alphabet.txt. I assured that I delete my previous models before training, I also trained 3 models today. One with space at first, second with space in the middle and finally one with space at the beginning just to be sure. The space at beginning works as expected, while the space at the middle position reproduces the same “spaceless” output mentioned in this thread.

To debug, I checked scripts where the alphabets are used.

  1. text.py both list and dictionary used to index str_to_label and vice versa work as expected.

  2. Alphabet in config.py also works as expected.

  3. only part left is ctc beam search decoder batch in evaluate.py. This is from the python module installed using pip. I went to the source code and checked init.py. This calls functions from swigwrapper.py. Now this in turn calls functions from the .so file. This is one I haven’t been able to check.

  4. I’ve also checked Alphabet.h in the native client which shows that it does look for a specific space index, I haven’t been able to see if the index is correct here either.

Well, Alphabet.h is what is used by the CTC decoder, so it would be better there’s no issue here :slight_smile:

swigwrapper is unlikely to have a play here, it’s just code generated for wrapping the C++ code from Python, it’s not even written by us, but by SWIG. It’s generated using native_client/ctcdecode/swigwrapper.i.

@alchemi5t I guess it would be worth filing a github issue now, with your STRs. Do you know if this reproduces everytime, everywhere ? Have you tried with the LDC93S1 sample ?

@lissyx
Here’s the problem reproduced in english(ldc93s1).

First, training model with stock alphabet.txt(space is the first character).

#) 200 epochs 
Test on data/ldc93s1/ldc93s1.csv - WER: 1.000000, CER: 0.846154, loss: 119.647385
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.846154, loss: 119.647385
 - src: "she had your dark suit in greasy wash water all year"
 - res: "he oriana"
--------------------------------------------------------------------------------
#) 400 epochs 
Test on data/ldc93s1/ldc93s1.csv - WER: 0.909091, CER: 0.673077, loss: 71.410652
--------------------------------------------------------------------------------
WER: 0.909091, CER: 0.673077, loss: 71.410652
 - src: "she had your dark suit in greasy wash water all year"
 - res: "had you a swaller"

Second, only changing Alphabet.txt to have space in the middle.(‘a’ is the first character now )

#) 200 epochs

Test on data/ldc93s1/ldc93s1.csv - WER: 1.000000, CER: 0.673077, loss: 117.956841
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.673077, loss: 117.956841
 - src: "she had your dark suit in greasy wash water all year"
 - res: "hearararatisarasaearararar"
--------------------------------------------------------------------------------
#) 400 epochs


--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.538462, loss: 51.921734
 - src: "she had your dark suit in greasy wash water all year"
 - res: "shadyararauieaysashaealyear"
--------------------------------------------------------------------------------

Finally to show the first character was being used as space, I kept ‘x’ as the first character, space is somewhere in the middle and ‘a’ is where ‘x’ used to be.

#) 200 epochs
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.826923, loss: 116.150482
 - src: "she had your dark suit in greasy wash water all year"
 - res: "haxraxianaxar"
--------------------------------------------------------------------------------

#) 400 epochs
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.711538, loss: 54.978935
 - src: "she had your dark suit in greasy wash water all year"
 - res: "haxraxuinxeahxtrxtlyr"
--------------------------------------------------------------------------------

^^ does my latest reply still warrant this? or have i missed something?

Well it seems it’s a legit issue, so worth investigating. Can I ask you to triple-check you are not re-using checkpoint in some way ?

on it. will train again.