Saved new best validating model have worst LOSS value when re-training

compatc · November 20, 2020, 3:56am

Hi guys,

Im training with Mozilla Common Voice Courpus Indonesia Language using Deepspeech 0.9.1 but got some problem

for information :
OS : Ubuntu 18.04.5
Python : 3.6.9
Tensorflow : 1.15.4

using this command :

python3 DeepSpeech.py \
--train_files $HOME/deepspeech/cv-corpus-5.1-2020-06-22/id/clips/train.csv \
--dev_files $HOME/deepspeech/cv-corpus-5.1-2020-06-22/id/clips/dev.csv \
--test_files $HOME/deepspeech/cv-corpus-5.1-2020-06-22/id/clips/test.csv \
--train_batch_size 128 \
--dev_batch_size 128 \
--test_batch_size 128 \
--checkpoint_dir $HOME/deepspeech/checkpoint \
--export_dir $HOME/deepspeech/model \
--n_hidden 2048 \
--learning_rate 0.0001 \
--dropout_rate 0.40 \
--epochs 30 \
--noearly_stop

Problem :
as stated on https://github.com/mozilla/DeepSpeech/releases/tag/v0.8.2 release note, i need 125 Epoch to train base model but because limited processing power (as im using CPU for training) im divided it into 4 cycle (1 cycle 30 Epoch)

on first cycle (epoch 0 ~ 29) new best validating model with loss 88.229980
Second cycle (epoch 30 ~ 59) new best validating model with loss 78.402226
but on Third Cycle (Epoch 60 ~ 89) new best validating model with loss 79.140933

Also when i rerun the training with 5 Epochs i got new best validating model with loss 82.142130.

so may someone help me, Thanks

othiele · November 20, 2020, 7:51am

The released English model has thousands of hours of material. If you have just 300 hours, 15 epochs is probably OK to get a model.

The loss is always relative, if you restart training you don’t restart from the same loss, but you are starting a new training. Much more important is how do train and dev loss change over time. Overfitting!

So you should make a good test set, that you can run on your cycles to find out if your model still improves. And get a good scorer.

compatc · November 20, 2020, 10:16am

is there any parameter to continue training?
as Epoch increase, Training Loss decrease but validation loss increase. should i stop train base model and do fine tuning?

othiele · November 20, 2020, 10:37am

If you use the same checkpoint dir, you continue training automatically.

Probably yes, read about overfitting.

You do fine-tuning if you have a good model and want to train a dialect or other specialitites. No.