Loss function appears to slowly climb over batches during epoch, reset every epoch

Hey ya’ll,

Any help understanding this would be much appreciated.

There is a tensorflow variable called ‘loss’ which is already defined in the train() method (of DeepSpeech.py). Not suprisingly really since it is what is passed to the gradient optimizer. :wink:

I added it to my tensorboard so I could see it’s progress while training DeepSpeech on a new language.

Over the first two and a bit epochs of training the loss function looks like this:

As you can see it appears to consistently go up during the epoch not down as I would’ve expected.

Over many epochs it does what you would want it to do - track down …

But I’m wondering if any one can help explain why the loss goes up from batch to batch.

It isn’t just a question of needing to divide the loss by the batch count in order to get the ‘average’ loss per batch because - well you can see from the numbers involved… it starts at 100 then goes up to only 300 over many batches (I think about ~50 batches per epoch in this case) so its not just a sum of the loss over all batches?

I assume I am missing something super obvious here but would love to know what the story is from anyone who does know.

Thanks!

DeepSpeech uses curriculum learning, an epoch starts with easier, short sentences and ends with harder, longer ones. Thus, the loss is lowest on the easy, short examples and higher on the hard, long examples.

2 Likes

oh that’s very helpful - thanks