Hello Everyone,
as you may have seen, I am using DeepSpeech 0.7 main release to train speech to text system.
I read a lot of posts here and helped me a lot. Now I appreciate any advice to make better my model. my model now is plateaued. so Many Parameters tested. I have commonvoice English data (600-700h voice)
First Let’s see what I have done before and their result:
1-) first try ROUND 1 ( 25 epoch| 2048 units … ) :
python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/save_chknpt_2 --save_checkpoint_dir data/save_chknpt_2 --export_dir data/exprt_dir_2 --n_hidden 2048 --epochs 25 --dropout_rate 0.35 --lm_alpha 0.75 --lm_beta 1.85 --learning_rate 0.00001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.000002
first Try ROUND 2 ( decreased learning rate) :
python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/save_chknpt_2 --save_checkpoint_dir data/save_chknpt_2 --export_dir data/exprt_dir_2 --n_hidden 2048 --epochs 25 --dropout_rate 0.35 --lm_alpha 0.75 --lm_beta 1.85 --learning_rate 0.000001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.0000002
Validation loss plateaued at validation loss 80 and train loss 60
=============================
2-) Second Try First Round :
Changes for first Round : I’ll try to reduce n_hidden to 512 . choose to have LR=0.00001 .ReduceLrOnPlateau=0.000001 plateau_epochs 2
python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/load_chkpnt_dir --save_checkpoint_dir data/save_chkpnt_dir --export_dir data/exprt_dir --n_hidden 512 --epochs 25 --dropout_rate 0.35 --lm_alpha 0.75 --lm_beta 1.85 --learning_rate 0.00001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.000001
Still Second Try ,Round 2 :
learning_rate 0.000001 , next 30 epoch
python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/save_chkpnt_dir --save_checkpoint_dir data/save_chkpnt_dir2 --export_dir data/exprt_dir2 --n_hidden 512 --epochs 30 --dropout_rate 0.35 --lm_alpha 0.75 --lm_beta 1.85 --learning_rate 0.00001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.000001
the last result was in epoch 24.
validation loss was 64
the last result on the test set was awful.
=================================
3-) Third Try. I increased n_hidden to 1024. Round 1
python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/load_chkpnt_dir1024 --save_checkpoint_dir data/save_chkpnt_dir1024 --export_dir data/exprt_dir1024 --n_hidden 1024 --epochs 35 --dropout_rate 0.35 --learning_rate 0.00001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.000001
last result in the first 25 epochs. validation loss 59.
Third Try Round 2 :
Let’s continue with this learning rate. But add Early stop.
python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/save_chkpnt_dir1024 --save_checkpoint_dir data/save_chkpnt_dir1024_2 --export_dir data/exprt_dir1024 --n_hidden 1024 --epochs 30 --dropout_rate 0.35 --learning_rate 0.00001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.000001 --early_stop True --es_epochs 3 --es_min_delta 0.09
stopped at epoch 5.
Round 3: increased early stop to 10. stopped at epoch number 11
ROUND 4. Decreased learning rate.
python3 DeepSpeech.py --train_files data/CV/en/clips/train2.csv --dev_files data/CV/en/clips/dev.csv --test_files data/CV/en/clips/test.csv --load_checkpoint_dir data/save_chkpnt_dir1024 --save_checkpoint_dir data/save_chkpnt_dir1024_2 --export_dir data/exprt_dir1024 --n_hidden 1024 --epochs 40 --dropout_rate 0.35 --learning_rate 0.000001 --train_batch_size 8 --dev_batch_size 8 --test_batch_size 1 --automatic_mixed_precision --train_cudnn True --reduce_lr_on_plateau True --plateau_epochs 2 --plateau_reduction 0.0000002 --early_stop True --es_epochs 10 --es_min_delta 0.09
now in round 4, epoch number 3, I have validation loss 59. and train loss 48
======================
Questions : ) (NOTE: I don’t need transfer learning for some reasons)
1- )I’m not using some other hyperparameters as mentioned in Mozilla Documentation here. Is it crucial to customize all the hyperparameters to achieve lower than 15 loss?
2-) I really appreciate it if you share with me your own hyperparameters fo your amount of data.
3-) Is validation loss 59, the capacity of my data?
4-) Is there any chance to achieve lower than 10 loss with 0.7 Mozilla Deepspeech?