Query regarding speed of training and issues with convergence

WER: 1.000000, CER: 0.940299, loss: 0.004681

  • src: “विमान चालविणाऱ्या वैमानिकाला अनेक गोष्टींकडे सतत अवधान द्यावे लागते”
  • res: "मला "

why is the loss this low(in comparison to the fact that the loss started out at 500), when the output is not even close?
what scale is it on. what do you guys expect usually.

It could slow down for a lot of reasons, and all our training do not show any slow down … We can’t really know for sure without more details on your system: maybe you have a setup limited in memory, and there’s some leakage (some people complained about that, but we have not been able to get to reproduce / meaningfull feedback).

What i’m concerned about is that they don’t have the same speed, so maybe TensorFlow is just affected and what you see as slowdown is normal because of your setup.

Please get inspired from the LDC93S1 sample overfit.

I thought so too. i’ll have to look into that. each card has 11 gigs and both have hbm2 memory. titan v’s have considerably more cuda cores though(5k on titan V vs 3k on titan Xp). I suspect the leak more than the different architectures; if it’s not sortagrad at work. I’ll post here if i find the answer.

Also, if you mean RAM, i have 256 gigs. and I monitor the process and everything is cool at that end.

I managed to get better results with my data. The problem was the language model which i built. i didnt think the LM would detoriate the results(which was stupid to assume). i have results which look more like the source now without using my trie or LM. I still have something i forgot to check today; In the experiment which gave me better results, all i did was skip the LM and trie options in the command line arguement but i know the flags.py has a default which is the LM binaty and trie for english which you guys have on your repo. I am not sure if the default lm and trie was used or not(even if it was, since it cant predict on unknown charset, it should not affect the output, i assume again, hopefully not wrong this time), but i was wondering how i woulf go about disabling the lm and trie altogether for initial experiments.

And about the loss being low, the loss is on the text predicted by deepspeech network before postproc by the lm and trie, and since the actual predicted text was better than the postprocesed text, the loss was low.

256GB ? Should be more than enough. Except if something badly leaks (some people complained, they somehow upgraded some python package and it got fixed)

If you set alpha and beta to 0.0 it should

yes

I will try that.

That’s expected:

Generally I see that the Titan V is faster on this kind of task, you should try only training first with the V’s, the weight sync can be a bottleneck limiting the true power of the V’s, I mean the V’s waiting for the XP’s to complete the epoch to sync is not really a good thing.

Try using the nvidia optimized container for auto mixed precision training (only with both Titan V)?

Ah! Thank you for the clarification. so i was’nt wrong on that part

True. I’ll try this soon and post timings. Or maybe it’ll be better if i post my entire findings after i manage to get my Marathi model running.

Right now:
WER: 1.000000, CER: 0.579710, loss: 0.000543

  • src: “अवशिष्टाचे एक चांगले उदाहरण स्फीनोडॉन या सरड्यासारख्या प्राण्याचे होय”
  • res: “अवउाताेउातगेउदहउसउतनोतेनउयाउसळउयाउसाउयाउपायओेउहोय”

**note spaces missing()
there is very little semblence in the results to the source but it’s better than just one word. i am trying to what lissyx suggested and setting lm_alpha and beta to 0 and then seeing if the LM was the problem.

The next experiment i have is to build a heavy duty language model and then try this experiment again.

That looks wrong alphabet format, please share it. Time ago my LM training was failing on reading the alphabet properly and then removing spaces, I suggest to try the native clients without LM to see if you get the same results.

1 Like

I do believe that might be the issue. i fixed the empty string res by updating the alphabets. Does space have to be the first character? also my charset has a zerowidth space.

This is my alphabet set for now.

उ
प
ा
ळ
न
म
ो
इ
ग
अ
ॲ
ऍ
ह
ऴ
ऊ
त
थ
ज
े
य
भ
ओ
द
ॅ
स
व
ू
ञ
ऋ
।
ख
ि
ब
ध
ी
ु
फ
ऩ
ई  #line below this is space. This comment is not in the alphabet file.
 
ट
ै
ऱ
ः
॰
ऌ
ौ
ॉ
॥
क
ॠ
झ
्
ठ
श
ँ
ल
ऐ
ॻ
़
ऑ
ऽ
ड
औ
ङ
ण
ढ           #the line under this is zero width space.Original file does not have this comment

ए
ष
च
छ
ं
ृ
आ
घ
र

I don’t know how your language works, what is the zero with space is used for?

It does’nt need it. It was a remnant in the cleaning processes. do you think that might be the issue? If so, i’ll clean up the data agin and try.

is the alphabet format wonky though? or is it ok.

Yes, remove it. Please try posting the alphabet again using the forum format, </>

It will be easier to read

Ok. On it.

I’ve updated the previous comment to have it as preformatted text.

Without the zero space yes.

New alphabet.txt :

उ
प
ा
ळ
न
म
ो
इ
ग
अ
ॲ
ऍ
ह
ऴ
ऊ
त
थ
ज
े
य
भ
ओ
द
ॅ
स
व
ू
ञ
ऋ
।
ख
ि
ब
ध
ी
ु
फ
ऩ
ई
 
ट
ै
ऱ
ः
॰
ऌ
ौ
ॉ
॥
क
ॠ
झ
्
ठ
श
ँ
ल
ऐ
ॻ
़
ऑ
ऽ
ड
औ
ङ
ण
ढ
ए
ष
च
छ
ं
ृ
आ
घ
र

WER: 1.000000, CER: 0.500000, loss: 0.732161

  • src: “याला आपले अबोध हेतू कारण असतात”
  • res: “याउापेअउोहेउतूउाअसउतात”

WER: 1.000000, CER: 0.631579, loss: 0.948461

  • src: “अनोमा बौद्ध साहित्यात निर्देशिलेली भारतातील एक पवित्र नदी”
  • res: “अनउादउसाउतयातऊनउेेउातातउपवतन”

output is still the same. any insights on if the alphabets should be ordered or not?

For the first time no, if you train a model then change the alphabet order then yes. Did you test it without the LM and reatrained the LM?

Looks like उ is being used as space maybe? change the space with उ I mean position

1 Like

I figured it out. " "has to be the first character in the alphabet.txt

src: “हा प्राणी सरीसृपांच्या सरपटणाऱ्या प्राण्यांच्या र्हिंकोसीफॅलिया गणातला आहे”

  • res: “हा पान सू स पान या स पना या प ा या या हे तो स तॅ या गा त ा हे”

output is bad but it’s not trained enough. I got the space.

you are right. exactly why i tried putiing space first.

testing without the LM for now.

i’ll experiment and try to understand how better to standardize the alphabet formatting.

<This was a discussion we had to have in private because i had to wait 3 hrs because i am a new user. putting this here so people have some context.>

After this converstion and after a lot of fine tuning, i have some results i am satisfied with. will keep posting if i find anything interesting. also, after i test the model out, if everything is as it should be, i’ll post my protocol too. Thanks for helping me. @lissyx @carlfm01!!

results

--------------------------------------------------------------------------------
WER: 0.222222, CER: 0.033898, loss: 0.000426
 - src: "मध्यजीवमहाकल्पच्या अखेरपासून हे कुल लुप्त झाले असा समज होता"
 - res: "मध्यजीव महाकल्पाच्या अखेरपासून हे कुल लुप्त झाले असा समज होता"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000007
 - src: "याला आपले अबोध हेतू कारण असतात"
 - res: "याला आपले अबोध हेतू कारण असतात"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000015
 - src: "अवधानाची ही चंचलता जीवनोपयोगी असते"
 - res: "अवधानाची ही चंचलता जीवनोपयोगी असते"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000016
 - src: "यांना समुद्री अवशिष्ट म्हणणे योग्य होईल"
 - res: "यांना समुद्री अवशिष्ट म्हणणे योग्य होईल"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000021
 - src: "तसेच काही ठिकाणी थंड पाण्याची खोल सरोवरेही होती"
 - res: "तसेच काही ठिकाणी थंड पाण्याची खोल सरोवरेही होती"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000021
 - src: "व रेखावृत्त ते पू"
 - res: "व रेखावृत्त ते पू"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000037
 - src: "किमी लोकसंख्या आकारमानाने पोर्तुगालच्या सु"
 - res: "किमी लोकसंख्या आकारमानाने पोर्तुगालच्या सु"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000072
 - src: "यायोगे प्राण्याला परिसरातील सगळ्या गोष्टींशी संपर्क ठेवता येतो"
 - res: "यायोगे प्राण्याला परिसरातील सगळ्या गोष्टींशी संपर्क ठेवता येतो"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000085
 - src: "असा बौद्ध साहित्यात उल्लेख आहे"
 - res: "असा बौद्ध साहित्यात उल्लेख आहे"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 0.000089
 - src: "विमान चालविणाऱ्या वैमानिकाला अनेक गोष्टींकडे सतत अवधान द्यावे लागते"
 - res: "विमान चालविणाऱ्या वैमानिकाला अनेक गोष्टींकडे सतत अवधान द्यावे लागते"
--------------------------------------------------------------------------------

I had some wonky issues with alphabet.txt, will also clarify after i get some concrete understanding of what was wrong.

Can you share some of that ? It might be a valuable information for others !

@lissyx I’ll test the model thoroughly(tomorrow) and if everything goes as expected(read- results are not a fluke(unlikely)), then i’ll post everthing in another clean post. Also, i have had some issues with the alphabet handling; i still have to reason why my initial models failed.

As soon as i am done with this, i’ll post the whole config and my complete protocol. I do not want to be spreading misinformation or miss out on something that was important. Post will be up in a few days. Meanwhile, if anyone has queries about anything i did in this post, i’ll respond to those here.

1 Like