Hello,
I am training a dataset for Urdu in the native text and successfully used transfer learning from the English pretrained model to achieve a loss of 36.299953 after 80 epochs on this data. I want to further improve on this by adjusting the parameters and applying some augmentation through Deep Speech.
The one big question I have is that if we are “continuing” training, why is the new best validating model saved if it is not better than the one used in the previous run?
The other question I have is what techniques can we use to decrease this loss rate? This is the command I am using to “continue” training.
python3 DeepSpeech.py
–drop_source_layers 2
–alphabet_config_path /$HOME/Uploads/UrduAlphabet_newscrawl2.txt
–load_checkpoint_dir /$HOME/DeepSpeech/dataset/trained_load_checkpoint
–save_checkpoint_dir /$HOME/DeepSpeech/dataset/trained_load_checkpoint
–train_files /$HOME/Uploads/trains55final.csv
–dev_files /$HOME/Uploads/devs55final.csv
–epochs 30
–train_batch_size 32
–export_dir /$HOME/DeepSpeech/dataset/urdu_trained
–export_file_name urdu
–learning_rate 0.00001
–scorer /$HOME/Uploads/kenlmnew.scorer
–n_hidden 2048
–dropout_rate 0.2
–train_cudnn true
I now want to adjust the parameters to continue and try to improve the loss.
How much difference would one form of augmentation alone make to our data? Or would it be more useful to use multiple augmentations together in the same run?
I know you can’t “think” for me but I am looking for a pointer to try and improve this. Will running the same data set (around 60 hours) produce better loss with different augmentation combinations?
The WER at 80 epochs is around 58% with a loss of 36.3. The training loss is at 32. Both continue to decrease so I know it is not overfitting and continuing training will reduce this a bit.
On other data sets, the training loss continues to decrease but validation loss starts increasing - based on other forum questions, that is overfitting, is my understanding correct?