Saved new best validating model have worst LOSS value when re-training

Hi guys,

Im training with Mozilla Common Voice Courpus Indonesia Language using Deepspeech 0.9.1 but got some problem

for information :
OS : Ubuntu 18.04.5
Python : 3.6.9
Tensorflow : 1.15.4

using this command :

python3 DeepSpeech.py \
--train_files $HOME/deepspeech/cv-corpus-5.1-2020-06-22/id/clips/train.csv \
--dev_files $HOME/deepspeech/cv-corpus-5.1-2020-06-22/id/clips/dev.csv \
--test_files $HOME/deepspeech/cv-corpus-5.1-2020-06-22/id/clips/test.csv \
--train_batch_size 128 \
--dev_batch_size 128 \
--test_batch_size 128 \
--checkpoint_dir $HOME/deepspeech/checkpoint \
--export_dir $HOME/deepspeech/model \
--n_hidden 2048 \
--learning_rate 0.0001 \
--dropout_rate 0.40 \
--epochs 30 \
--noearly_stop

Problem :
as stated on https://github.com/mozilla/DeepSpeech/releases/tag/v0.8.2 release note, i need 125 Epoch to train base model but because limited processing power (as im using CPU for training) im divided it into 4 cycle (1 cycle 30 Epoch)

on first cycle (epoch 0 ~ 29) new best validating model with loss 88.229980
Second cycle (epoch 30 ~ 59) new best validating model with loss 78.402226
but on Third Cycle (Epoch 60 ~ 89) new best validating model with loss 79.140933

Also when i rerun the training with 5 Epochs i got new best validating model with loss 82.142130.

so may someone help me, Thanks

The released English model has thousands of hours of material. If you have just 300 hours, 15 epochs is probably OK to get a model.

The loss is always relative, if you restart training you don’t restart from the same loss, but you are starting a new training. Much more important is how do train and dev loss change over time. Overfitting!

So you should make a good test set, that you can run on your cycles to find out if your model still improves. And get a good scorer.

is there any parameter to continue training?
as Epoch increase, Training Loss decrease but validation loss increase. should i stop train base model and do fine tuning?

If you use the same checkpoint dir, you continue training automatically.

Probably yes, read about overfitting.

You do fine-tuning if you have a good model and want to train a dialect or other specialitites. No.