Question for korean transfer learning

Hello, I’m trying to transfer learning for Korean.
I have 2 questions for Korean transfer learning.

My setting
DeepSpeech version : 0.9.0-alpha.3
GPU : 7G
training data : 500h+
validation data : almost 9h
test data : almost 9h

training parameter :

python3 DeepSpeech.py
–drop_source_layers 1
–alphabet_config_path deepspeech_alphabet.txt
–load_checkpoint_dir deepspeech-0.8.0-checkpoint
–save_checkpoint_dir my_checkpoint_dir
–export_dir scorer_model
–train_files corpora/ko/clips/train.csv
–dev_files corpora/ko/clips/dev.csv
–test_files corpora/ko/clips/test.csv
–train_cudnn
–learning_rate 0.00001
–scorer my_kor_scorer.scorer
–summary_dir summary_dir &> log/0914_transfer_learning.log &

alphabet size: 2200

scorer :
scorer is created from sentences in train.csv, dev.csv, test.csv

Question 1.

My training loss and validation loss reduces in 0~6 epoch,
after that validation loss is not changed from previous value but training loss is still reduced.

Epoch 0 | Training | Elapsed Time: 16:41:11 | Steps: 481008 | Loss: 28.460519 Epoch 0 | Validation | Elapsed Time: 0:01:02 | Steps: 1070 | Loss: 22.371003 | Dataset: corpora/ko/clips/dev.csv

Epoch 1 | Training | Elapsed Time: 16:33:00 | Steps: 481008 | Loss: 19.757104
Epoch 1 | Validation | Elapsed Time: 0:01:02 | Steps: 1070 | Loss: 19.216509 | Dataset: corpora/ko/clips/dev.csv

Epoch 2 | Training | Elapsed Time: 16:32:23 | Steps: 481008 | Loss: 17.215991
Epoch 2 | Validation | Elapsed Time: 0:00:58 | Steps: 1070 | Loss: 18.027218 | Dataset: corpora/ko/clips/dev.csv

Epoch 3 | Training | Elapsed Time: 16:33:35 | Steps: 481008 | Loss: 15.617158
Epoch 3 | Validation | Elapsed Time: 0:01:01 | Steps: 1070 | Loss: 17.366926 | Dataset: corpora/ko/clips/dev.csv

Epoch 6 | Training | Elapsed Time: 1 day, 0:33:51 | Steps: 481008 | Loss: 12.790936
Epoch 6 | Validation | Elapsed Time: 0:01:34 | Steps: 1070 | Loss: 16.471936 | Dataset: corpora/ko/clips/dev.csv

After Epoch 6, validation loss is not changed so much.

Epoch 7 | Training | Elapsed Time: 1 day, 1:59:56 | Steps: 481008 | Loss: 12.143217 Epoch 7 | Validation | Elapsed Time: 0:01:35 | Steps: 1070 | Loss: 16.556080 | Dataset: corpora/ko/clips/dev.csv

Epoch 9 | Training | Elapsed Time: 16:33:11 | Steps: 481008 | Loss: 11.009611
Epoch 9 | Validation | Elapsed Time: 0:01:02 | Steps: 1070 | Loss: 16.671552 | Dataset: corpora/ko/clips/dev.csv

Epoch 11 | Training | Elapsed Time: 16:32:19 | Steps: 481008 | Loss: 10.096103
Epoch 11 | Validation | Elapsed Time: 0:01:11 | Steps: 1070 | Loss: 16.909459 | Dataset: corpora/ko/clips/dev.csv

Is it means overfitting? If so, what should I do?

Question 2.

I create output.pb from checkpoint (from Epoch 6) with my scorer (from csv files) and my alphabet (size 2200)
and tried to inference with scorer, but it’s not so correct and it takes 9s.

deepspeech \ --model 0914_1672loss_model/output_graph.pb \ --scorer my_kor_scorer.scorer \ --audio my_data/blob.wav

둘 제자리다
Inference took 9.047s for 5.952s audio file.

Inference without scorer is much correct but it takes time too much.
Why inference takes too much time?

deepspeech \ --model 0914_1672loss_model/output_graph.pb \ --audio my_data/blob.wav

둘 제 회이로 작성부탁내리이다
Inference took 2231.380s for 5.952s audio file.

English inference with scorer or not, takes less than a second.

deepspeech \ --model deepspeech-0.8.0-models.pbmm \ --scorer deepspeech-0.8.0-models.scorer \ --audio english_voice.wav

english test
Inference took 0.648s for 1.272s audio file.

##########

deepspeech
–model deepspeech-0.8.0-models.pbmm
–audio english_voice.wav

english test
Inference took 0.792s for 1.272s audio file.

What makes the different? alphabet size?

I want to inference rapidly without scorer or inference with scorer correctly.

Please give me some advice.

that describes overfitting

analyze why, maybe it’s just time to stop :slight_smile:

no idea, you don’t provide context

we don’t know how you built it

ok, so big alphabet, scorer is slow, this is known @reuben is still working on that

Adding to lissyx, a good scorer - at least for English, German, … - will get you better results than just the model.

So, play around with a custom model and you should get better results. It is complex, but check the generate_lm script and the kenlm documentation.

And a simple hack, generate a pbmm from the pb. Not much time you can save but a couple percent.

Give us more information and we might be able to help your more :slight_smile:

Technically, for languages with big alphabets like that, this is where the “Bytes output mode” makes sense, and you don’t need to use an alphabet at all. But this is still being refined by @reuben

Thank you for your quick response.

In Question 1,
Transfer learning in official doc didn’t use dropout_rate, but is it better to use dropout_rate in transfer learning to handle overfitting?

In Question 2,
I create scorer using kenlm(lmplz, build_binary) and generate_scorer_package in deepspeech-0.8.0 native client.

The sentence used to make scorer was the same one used to train the model.
It can affect the accuracy of scorer? Should I have to use other data? Or just the amount of sentence can affect the accuracy of scorer?

All of that depends on your needs and dataset, I have no experience with transfer learning from English to Korean, and I have no knowledge of Korean, so can’t help.

This looks like this might make your model think it is smarter than it is actually, but devil’s in details and it all depends on exactly what you did there, and what kind of application you are working on.

I added the dropout rate and tried again from few days ago, and dropout rate reduce my loss to 15.94 at Epoch 14

But overfitting occurred again after Epoch 14.

Here is my training parameter and loss

python3 DeepSpeech.py
–drop_source_layers 1
–alphabet_config_path deepspeech_alphabet.txt
–load_checkpoint_dir deepspeech-0.8.0-checkpoint
–save_checkpoint_dir save_checkpoint_dir
–export_dir export_dir
–train_files train.csv
–dev_files dev.csv
–test_files test.csv
–train_cudnn
–learning_rate 0.00001
–dropout_rate 0.3
–scorer my_kor_scorer.scorer
–summary_dir summary_dir &> log.log &

I Saved new best validating model with loss 24.269239 to: save_checkpoint_dir/best_dev-1213530
I Saved new best validating model with loss 20.687797 to: save_checkpoint_dir/best_dev-1694538
I Saved new best validating model with loss 19.284921 to: save_checkpoint_dir/best_dev-2175546
I Saved new best validating model with loss 18.406521 to: save_checkpoint_dir/best_dev-2656554
I Saved new best validating model with loss 17.785742 to: save_checkpoint_dir/best_dev-3137562
I Saved new best validating model with loss 17.574958 to: save_checkpoint_dir/best_dev-3618570
I Saved new best validating model with loss 17.179890 to: save_checkpoint_dir/best_dev-4099578
I Saved new best validating model with loss 16.957047 to: save_checkpoint_dir/best_dev-4580586
I Saved new best validating model with loss 16.772893 to: save_checkpoint_dir/best_dev-5061594
I Saved new best validating model with loss 16.579672 to: save_checkpoint_dir/best_dev-6023610
I Saved new best validating model with loss 16.313302 to: save_checkpoint_dir/best_dev-6504618
I Saved new best validating model with loss 16.258681 to: save_checkpoint_dir/best_dev-6985626
I Saved new best validating model with loss 15.945202 to: save_checkpoint_dir/best_dev-7466634

Would increasing the ratio of the data for dev help to prevent over-fitting?