Hyper-parameter Selecting, Decreasing Loss

masoud_parpanchi · August 10, 2020, 8:39am

Hello Everyone,
as you may have seen, I am using DeepSpeech 0.7 main release to train speech to text system.

I read a lot of posts here and helped me a lot. Now I appreciate any advice to make better my model. my model now is plateaued. so Many Parameters tested. I have commonvoice English data (600-700h voice)

First Let’s see what I have done before and their result:

1-) first try ROUND 1 ( 25 epoch| 2048 units … ) :
python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/save_chknpt_2 --save_checkpoint_dir data/save_chknpt_2 --export_dir data/exprt_dir_2 --n_hidden 2048 --epochs 25 --dropout_rate 0.35 --lm_alpha 0.75 --lm_beta 1.85 --learning_rate 0.00001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.000002

first Try ROUND 2 ( decreased learning rate) :

python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/save_chknpt_2 --save_checkpoint_dir data/save_chknpt_2 --export_dir data/exprt_dir_2 --n_hidden 2048 --epochs 25 --dropout_rate 0.35 --lm_alpha 0.75 --lm_beta 1.85 --learning_rate 0.000001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.0000002

Validation loss plateaued at validation loss 80 and train loss 60

=============================
2-) Second Try First Round :
Changes for first Round : I’ll try to reduce n_hidden to 512 . choose to have LR=0.00001 .ReduceLrOnPlateau=0.000001 plateau_epochs 2

python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/load_chkpnt_dir --save_checkpoint_dir data/save_chkpnt_dir --export_dir data/exprt_dir --n_hidden 512 --epochs 25 --dropout_rate 0.35 --lm_alpha 0.75 --lm_beta 1.85 --learning_rate 0.00001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.000001

Still Second Try ,Round 2 :
learning_rate 0.000001 , next 30 epoch

python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/save_chkpnt_dir --save_checkpoint_dir data/save_chkpnt_dir2 --export_dir data/exprt_dir2 --n_hidden 512 --epochs 30 --dropout_rate 0.35 --lm_alpha 0.75 --lm_beta 1.85 --learning_rate 0.00001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.000001

the last result was in epoch 24.
validation loss was 64
the last result on the test set was awful.

=================================
3-) Third Try. I increased n_hidden to 1024. Round 1

python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/load_chkpnt_dir1024 --save_checkpoint_dir data/save_chkpnt_dir1024 --export_dir data/exprt_dir1024 --n_hidden 1024 --epochs 35 --dropout_rate 0.35 --learning_rate 0.00001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.000001

last result in the first 25 epochs. validation loss 59.

Third Try Round 2 :
Let’s continue with this learning rate. But add Early stop.

python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/save_chkpnt_dir1024 --save_checkpoint_dir data/save_chkpnt_dir1024_2 --export_dir data/exprt_dir1024 --n_hidden 1024 --epochs 30 --dropout_rate 0.35 --learning_rate 0.00001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.000001 --early_stop True --es_epochs 3 --es_min_delta 0.09

stopped at epoch 5.

Round 3: increased early stop to 10. stopped at epoch number 11

ROUND 4. Decreased learning rate.
python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/save_chkpnt_dir1024 --save_checkpoint_dir data/save_chkpnt_dir1024_2 --export_dir data/exprt_dir1024 --n_hidden 1024 --epochs 40 --dropout_rate 0.35 --learning_rate 0.000001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.0000002 --early_stop True --es_epochs 10 --es_min_delta 0.09

now in round 4, epoch number 3, I have validation loss 59. and train loss 48

======================

Questions : ) (NOTE: I don’t need transfer learning for some reasons)
1- )I’m not using some other hyperparameters as mentioned in Mozilla Documentation here. Is it crucial to customize all the hyperparameters to achieve lower than 15 loss?

2-) I really appreciate it if you share with me your own hyperparameters fo your amount of data.

3-) Is validation loss 59, the capacity of my data?

4-) Is there any chance to achieve lower than 10 loss with 0.7 Mozilla Deepspeech?

lissyx · August 10, 2020, 8:54am

There is no actionable result here, loss value in itself is not helpful as it depedns directly on your datasets.

You should present trainign phase as charts, it’s not usable otherwise: we can’t know if it’s overfitting, etc.

You have not shared any test result.

“this” with a link to a moving target and a relative date is not helpful as well: please share exact release infos and amount of data once imported.

masoud_parpanchi · August 10, 2020, 9:11am

First thanks for your response.

I edited the section about data.

But about graphs. I’ve just learnt how to create them yesterday. I’ll share them as soon as possible.

Do you have any Tip for this information I shared in the post for the questions?

Topic		Replies	Views
Loss Not Reducing DeepSpeech	5	2692	April 12, 2018
Training of Epoch x - loss: inf. again decrease LR, stil same issue repeating DeepSpeech	10	839	October 18, 2018
Information on pre-trained Models DeepSpeech	13	2152	October 31, 2018
Pre-trained model become worse when i trained common voice data DeepSpeech	15	1796	September 21, 2019
Training loss of DeepSpeech bigger and bigger DeepSpeech	3	611	April 10, 2020

Hyper-parameter Selecting, Decreasing Loss

Related topics