Any help understanding this would be much appreciated.
There is a tensorflow variable called ‘loss’ which is already defined in the train() method (of DeepSpeech.py). Not suprisingly really since it is what is passed to the gradient optimizer.
I added it to my tensorboard so I could see it’s progress while training DeepSpeech on a new language.
Over the first two and a bit epochs of training the loss function looks like this:
As you can see it appears to consistently go up during the epoch not down as I would’ve expected.
Over many epochs it does what you would want it to do - track down …
But I’m wondering if any one can help explain why the loss goes up from batch to batch.
It isn’t just a question of needing to divide the loss by the batch count in order to get the ‘average’ loss per batch because - well you can see from the numbers involved… it starts at 100 then goes up to only 300 over many batches (I think about ~50 batches per epoch in this case) so its not just a sum of the loss over all batches?
I assume I am missing something super obvious here but would love to know what the story is from anyone who does know.