Fine tuning a model with large datasets. model can't adapt for transfer learning?


(Murugan R) #1

i was fine tuned pre-trained model with youtube datasets (indian accent). it is near 100 hrs audio files. epoch 35, batch size 3-3-3 meanwhile remaining things are same for our deepspeech instruction for continue training.

I Testing epoch 35...
I Test of Epoch 35 - WER: 0.500695, loss: 88.6900565696485, mean edit distance: 0.278072
I --------------------------------------------------------------------------------
I WER: 0.125000, loss: 0.063078, mean edit distance: 0.025000
I  - src: " difference don't freak out if you get a"
I  - res: " difference don't freak out if you get "
I --------------------------------------------------------------------------------
I WER: 0.142857, loss: 0.060561, mean edit distance: 0.024390
I  - src: " slice of the retail business that's over"
I  - res: "a slice of the retail business that's over"
I --------------------------------------------------------------------------------
I WER: 0.142857, loss: 0.089889, mean edit distance: 0.027778
I  - src: " question what is it about the first"
I  - res: "a question what is it about the first"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.007418, mean edit distance: 0.142857
I  - src: " change"
I  - res: "i change"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.007418, mean edit distance: 0.142857
I  - src: " change"
I  - res: "i change"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.013327, mean edit distance: 0.250000
I  - src: " company"
I  - res: "a company "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.013327, mean edit distance: 0.250000
I  - src: " company"
I  - res: "a company "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.056036, mean edit distance: 0.125000
I  - src: " project"
I  - res: "a project"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.075017, mean edit distance: 0.250000
I  - src: " project"
I  - res: "a project "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.075017, mean edit distance: 0.250000
I  - src: " project"
I  - res: "a project "
I --------------------------------------------------------------------------------
I Exporting the model...
Converted 12 variables to const ops.
I Models exported at model_export_youtubeV3/

and i am testing inference,

actual:how are you
res: a

actual: how can i apply for aadhaar card pan card
res: how can a a bar abdy

actual: you are not working
res: a

deepspeech 0.2.1a1
tensorflow 1.11.0

DeepSpeech v0.2.0 and pretrained model v0.2.0

i was trained model for EC2 p3 instance xlarge 8GPUs. it takes 18 hrs.

sir can you help me please? is it any problem for hyper parameter? then how to fine tune a training and get good accuracy?

i didn’t get best accuracy for my model(indian accent) for fine tuning.

thank you,
Murugan R


(Lissyx) #2

It’s complicated, we are only experimenting yet on transfer learning, we dont have a lot of feedback yet on the proper steps to get something really good.


(Lissyx) #3

It’s complicated to judge with so few elements ; maybe you don’t have enough data yet. Can you ensure the source material is adequate ? PCM 16 bits 16kHz mono ? If you perform any conversion, can you ensure it does not add any artifacts ?

The v0.3.0 model might be a good try, it should contain more common voice data. You could also rebase your work on current master of deepspeech, with v0.3.0 checkpoints: you can benefit from the new decoder, it might improve things.


(Murugan R) #4

If I have to fine tune our pretrained model with libspeech not included for these three datasets( common voice, switch, fisher), will it get more accuracy and then adapt accent variations? are you previously tested sir?

thank you for your quick response sir.:slightly_smiling_face:


(Lissyx) #5

For the n-th time: it might, but we can’t promise anything.


(Murugan R) #6

thank you for your response sir :slightly_smiling_face: