Fine-tuning DeepSpeech Model (CommonVoice-DATA)

ctzogka · June 26, 2019, 7:37am

Hi, i am trying to produce a model that will manage to transcribe properly large files (e.g. 10m of BBC-news). Firstly, i thought to train my own model with CommonVoice last release but i realized that it demands many hours of training… So, i decided to fine-tune the existing model (v0.5.1) that has already been trained on CommonVoice data.
I am using the same hyperparameters as reported at the last release. However, i am getting worse results (WER ~ 80%) from the first exported model after early-stopping, while Mozilla released model achieves WER 55%. Let me refer that i am using streaming api (/examples/ffmpeg_vad_streaming) for decoding the large files.
My question is: could i ever get a better trained model ? Is it reasonable getting worse performance after fine-tuning with the same hyperparameters? Has anybody tried something similar?

lissyx · June 26, 2019, 9:04am

Please make sure you reproduce with C++ client first, it’s possible this example does extra processing that interferes.

ctzogka · June 26, 2019, 10:04am

With C++ client do you mean /DeepSpeech/native_client/deepspeech --model... --alphabet ... --lm ... --trie ... command? When i run this command the results are worse too (in comparison to mozilla release model). I’ve been searching for streaming api and i was informed in previous post that i should write code. Is there any example for english?
Also, training is being continued but now i see inf train/dev loss (1st & 2nd ep) and i wonder if it’s reasonable…

ctzogka · June 26, 2019, 10:19am

About VAD example i have noticed that it’s pretty much the same inference. I mean that it may use some extra parameters but i don’t thing it can affect the results in a large degree. I think probably something is going wrong with fine-tuning or maybe i have to spend more training hours.

lissyx · June 26, 2019, 11:49am

There are examples, I’m not really sure I understand your point here.

Yes, well, at some point, without more context on how you train, it’s hard …

ctzogka · June 26, 2019, 11:56am

  --train_files /home/christina/PycharmProjects/DeepSpeech1/cv_model/train.csv \
  --dev_files /home/christina/PycharmProjects/DeepSpeech1/cv_model/dev.csv \
  --test_files /home/christina/PycharmProjects/DeepSpeech1/cv_model/test.csv \
  --train_batch_size 24 \
  --dev_batch_size 48 \
  --test_batch_size 48 \
  --dropout_rate 0.15 \
  --epochs 30 \
  --validation_step 1 \
  --report_count 20 \
  --early_stop True \
  --learning_rate 0.0001\
  --export_dir /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/checkpoints \
  --checkpoint_dir /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/checkpoints \
  --alphabet_config_path /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/alphabet.txt \
  --lm_binary_path /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/lm.binary \
  --lm_trie_path /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/trie \

I have already mentioned that i used the same hyperparameter with the released model 0.5.1 and the checkpoints. I train on gpu (GeForce RTX - 10989MiB) with CommonVoice validated data. Let me know if you need more information.

lissyx · June 26, 2019, 1:26pm

You did, but you also mentionned fine-tuning without any information regarding what dataset you are using to do that.

In your command line, what does train.csv, test.csv and dev.csv are? Common Voice dataset ?

ctzogka · June 26, 2019, 1:34pm

Yes, all my .csv files contain CommonVoice data (from the last release - 774 validated hours).

lissyx · June 26, 2019, 1:36pm

Okay, so are you aware this is not good practice, your model will have alreday learnt some of those data ?

ctzogka · June 26, 2019, 1:40pm

If i understand your question, you mean that last released model was trained on Common Voice dataset so when i fine-tune with the same data i get worse results??

lissyx · June 26, 2019, 1:42pm

Yes, it was documented in v0.4 releases notes, looks like it’s not in the latest ones though.

lissyx · June 26, 2019, 1:45pm

Ok, never mind, turns out that in this release we did not.

lissyx · June 26, 2019, 1:46pm

Try to lower the learning rate more

ctzogka · June 26, 2019, 1:51pm

Ok, i would try with lr=0.00005 and i will report my results. Thank you for the quick responses!!

ctzogka · June 27, 2019, 2:12pm

The problem remains… By the 4th epoch everything seems normal (train & val loss ~ 25% and are being decreased gradually). However, at the end of 4th epoch suddenly train loss = inf and then training process stops by early-stopping while val_loss = inf. I tried to fine-tune again using the recent checkpoints but loss is stuck to inf. What is going wrong, does anyone have any idea?

  --train_files /home/christina/PycharmProjects/DeepSpeech1/cv_model/train.csv \
  --dev_files /home/christina/PycharmProjects/DeepSpeech1/cv_model/dev.csv \
  --test_files /home/christina/PycharmProjects/DeepSpeech1/cv_model/test.csv \
  --train_batch_size 64 \
  --dev_batch_size 64 \
  --test_batch_size 64 \
  --dropout_rate 0.15 \
  --epochs 30 \
  --validation_step 1 \
  --report_count 20 \
  --early_stop True \
  --learning_rate 0.00005\
  --export_dir /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/checkpoints \
  --checkpoint_dir /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/checkpoints \
  --alphabet_config_path /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/alphabet.txt \
  --lm_binary_path /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/lm.binary \
  --lm_trie_path /home/christina/PycharmProjects/DeepSpeech1/mozilla-models/trie \

ctzogka · June 27, 2019, 2:15pm

I have also tried lr = 0.00001 / 0.001 / 0.01 and lower batch size but still the same problem with inf loss…

lissyx · June 27, 2019, 2:47pm

in some cases, some had more success with learning rate at 1e-6

kdavis · June 28, 2019, 6:22am

This is usually a sign that the learning rate is too high. Try lowering by a factor of 10 say.

ctzogka · June 30, 2019, 8:49am

Indeed lr = 1e-6 solved the problem! I haven’t seen inf loss since i lowered the value. Thank you!

sranjeet.visteon · July 4, 2019, 11:34pm

@ctzogka where you able to complete the fine tuning of 0.5.1 pre-trained model with CommonVoice data? How do you see the results w.r.t WER? Is there a specific accent you are trying to target using the new model?

Topic		Replies	Views
DeepSpeech training with large files DeepSpeech	6	1006	June 23, 2019
Any reason 0.5.x models weren't trained on Common Voice data this time? DeepSpeech	8	941	June 27, 2019
Fine Tuning model DeepSpeech	3	397	May 23, 2020
Has anyone successfully fine tuned a deepspeech mode? DeepSpeech	4	1135	August 1, 2018
Inaccurate results from 0.9.3 model Common Voice learning	1	364	April 16, 2024

Fine-tuning DeepSpeech Model (CommonVoice-DATA)

Related topics