Fine-tuning DeepSpeech Model (CommonVoice-DATA)

You did, but you also mentionned fine-tuning without any information regarding what dataset you are using to do that.

In your command line, what does train.csv, test.csv and dev.csv are? Common Voice dataset ?

Yes, all my .csv files contain CommonVoice data (from the last release - 774 validated hours).

Okay, so are you aware this is not good practice, your model will have alreday learnt some of those data ?

If i understand your question, you mean that last released model was trained on Common Voice dataset so when i fine-tune with the same data i get worse results??

Yes, it was documented in v0.4 releases notes, looks like it’s not in the latest ones though.

Ok, never mind, turns out that in this release we did not.

Try to lower the learning rate more

Ok, i would try with lr=0.00005 and i will report my results. Thank you for the quick responses!!

1 Like

The problem remains… By the 4th epoch everything seems normal (train & val loss ~ 25% and are being decreased gradually). However, at the end of 4th epoch suddenly train loss = inf and then training process stops by early-stopping while val_loss = inf. I tried to fine-tune again using the recent checkpoints but loss is stuck to inf. What is going wrong, does anyone have any idea?

  --train_files /home/christina/PycharmProjects/DeepSpeech1/cv_model/train.csv \
  --dev_files /home/christina/PycharmProjects/DeepSpeech1/cv_model/dev.csv \
  --test_files /home/christina/PycharmProjects/DeepSpeech1/cv_model/test.csv \
  --train_batch_size 64 \
  --dev_batch_size 64 \
  --test_batch_size 64 \
  --dropout_rate 0.15 \
  --epochs 30 \
  --validation_step 1 \
  --report_count 20 \
  --early_stop True \
  --learning_rate 0.00005\
  --export_dir /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/checkpoints \
  --checkpoint_dir /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/checkpoints \
  --alphabet_config_path /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/alphabet.txt \
  --lm_binary_path /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/lm.binary \
  --lm_trie_path /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/trie \

I have also tried lr = 0.00001 / 0.001 / 0.01 and lower batch size but still the same problem with inf loss…

in some cases, some had more success with learning rate at 1e-6

This is usually a sign that the learning rate is too high. Try lowering by a factor of 10 say.

Indeed lr = 1e-6 solved the problem! I haven’t seen inf loss since i lowered the value. Thank you!

@ctzogka where you able to complete the fine tuning of 0.5.1 pre-trained model with CommonVoice data? How do you see the results w.r.t WER? Is there a specific accent you are trying to target using the new model?

Hi @sranjeet.visteon!
I haven’t complete fine-tuning yet… Let me explain my plan:
I am trying to build a model that will support large files decoding such as BBC news (10 minutes). In these clips there is noise (e.g. phone conversation), pronunciation (e.g. Hindi, African e.t.c) and sometimes intermediate music parts. So i have been using CV data to fine-tune 0.5.1 (do not include training on CV) in order to boost accuracy. Firstly, WER = 58% and now WER = 51% on these clips. On CV test set there is also a significant improvement: from WER=44% to WER=22%!

Hi @ctzogka

did you use the parameter in your previous posts (epochs 30, lr 1e-6, drop_out 0.15, … ) to get those results ?

Interesting usecase you got here, do you have a git where I can follow your progresses ? I have something similar in mind :slight_smile:

Thanks !

Hello @caucheteux!
I used all the parameters i referred before. However, currently i am fine-tuning with lr = 5e-1 / 95e-1. I noticed that WER increased for my BBC-news test clip when i set lr = 95e-1. On the contary, WER decreased for CV-test set for the same value.
I don’t have any git but I will gradually report my results here for your orientation.

1 Like

My advise for anyone who is doing something similar is to examine lr values in order to find the appropriate.
"A large learning rate allows the model to learn faster, at the cost of arriving on a sub-optimal final set of weights. A smaller learning rate may allow the model to learn a more optimal or even globally optimal set of weights but may take significantly longer to train."
It’s safe starting from 1e-6 and continue increasing when results get better. Always check train & val loss, if they are close you are doing good job (no over/under-fitting)!

1 Like

hi @lissyx,

I have a question. As the model from v0.5.1 is not trained from CV dataset, is it possible that some sentences are not in the lm provided ?
I found some absurdities such as “he the man” instead of “amen”, is it a lm problem or just a model problem ?

After a 20 epochs, lr = 1e-6, droprate = 0.15, finetuning, dropout layer = 1 and trained on CV dataset, I got 48% WER, same as original model. Do I miss something somewhere ? Note that I add to force no earlystop. But no Overfitting (train_loss ~38.0 dev_los ~41)

Sorry if it’s not clear :slight_smile:
Thanks !

No, the LM is built from Wikipedia, not from CV.