I am trying to train my own korean ds model with ds 0.6.1

zhangpeng_K · March 26, 2020, 10:16am

hi @lissyx :
I am trying to train my own korean ds model ，all the datasets convolve about 50hs, and i devide it into 7:2:1 to train my model. and i have trained 58 epochs, but the train/valid loss and cer always too high . could you give me some sorme advice ? thanks ! below is my training parameter and training log :

traing parameter:

python -u DeepSpeech.py --noshow_progressbar \
  --train_files /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/train.csv \
  --test_files /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test.csv  \
  --dev_files /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv  \
  --alphabet_config_path /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/alphabet.txt \
  --lm_binary_path /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/lm.binary \
  --lm_trie_path /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zores-korean/trie \
  --train_batch_size 64 \
  --test_batch_size 64 \
  --n_hidden 360 \
  --epochs 200 \
  --learning_rate 0.001 \
  --checkpoint_dir /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/model \
  --export_dir /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/model \
  --summary_dir /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/model \
  --test_output_file /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/model/test_result.json \
  --load best \
  --dropout_rate 0.05 \
  --dropout_rate5 0.50 \
  --dropout_rate6 0.90 \
  --export_tflite True \
  --max_to_keep 3 \
  --use_allow_growth True \
  "$@"

training logs:

I Finished training epoch 50 - loss: 75.382558
I Validating epoch 50 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv...
I Finished validating epoch 50 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv - loss: 86.900587
I Saved new best validating model with loss 86.900587 to: /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/model/best_dev-20700
I Training epoch 51...
I Finished training epoch 51 - loss: 74.932671
I Validating epoch 51 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv...
I Finished validating epoch 51 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv - loss: 87.981283
I Training epoch 52...
I Finished training epoch 52 - loss: 74.522279
I Validating epoch 52 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv...
I Finished validating epoch 52 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv - loss: 87.412124
I Training epoch 53...
I Finished training epoch 53 - loss: 74.039336
I Validating epoch 53 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv...
I Finished validating epoch 53 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv - loss: 88.433213
I Training epoch 54...
I Finished training epoch 54 - loss: 73.771728
I Validating epoch 54 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv...
I Finished validating epoch 54 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv - loss: 87.216289
I Training epoch 55...
I Finished training epoch 55 - loss: 73.068302
I Validating epoch 55 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv...
I Finished validating epoch 55 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv - loss: 87.642236
I Training epoch 56...
I Finished training epoch 56 - loss: 72.880096
I Validating epoch 56 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv...
I Finished validating epoch 56 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv - loss: 87.771627
I Training epoch 57...
I Finished training epoch 57 - loss: 72.461695
I Validating epoch 57 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv...
I Finished validating epoch 57 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv - loss: 88.975556
I Training epoch 58...
I Finished training epoch 58 - loss: 72.195924
I Validating epoch 58 on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/eval.csv...
^CI FINISHED optimization in 3:46:46.436973
INFO:tensorflow:Restoring parameters from /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/model/best_dev-20700
I0326 15:34:02.412347 139993696040768 saver.py:1284] Restoring parameters from /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/model/best_dev-20700
I Restored variables from best validation checkpoint at /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/model/best_dev-20700, step 20700
Testing model on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test.csv
I Test epoch...
Test on /home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test.csv - WER: 0.110630, CER: 0.079688, loss: 91.874428
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.680000, loss: 106.390770
 - wav: file:///home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test_data_01/003/112/112_003_2940.wav
 - src: "이러한 계획에 의해서 강물을 뺀다 정수근물론 그렇다고 수문을 전부 다 활짝 연 것은 아니다"
 - res: "이런 기지에서 안 때 중국 그러고 신부가 발전 경고합니다"
--------------------------------------------------------------------------------
WER: 0.937500, CER: 0.666667, loss: 178.181534
 - wav: file:///home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test_data_01/003/118/118_003_0846.wav
 - src: "독일을 보호하기 위하여 회스는 약 이백 만 명의 인명을 앗아 가는 일을 지휘하였는데 그중 대부분이 유대인이었습니다"
 - res: "도를 보인 해서는 이백 사람을 명을 나가는 진행돼 그 정부의 데다"
--------------------------------------------------------------------------------
WER: 0.846154, CER: 0.681818, loss: 142.115295
 - wav: file:///home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test_data_01/003/118/118_003_0046.wav
 - src: "이 전 회장의 장례식은 별세 시점인 십 사 일을 기준으로 일곱 일장으로 치뤄진다"
 - res: "회장의 정치적인 사이의 일정한 치뤄진다"
--------------------------------------------------------------------------------
WER: 0.750000, CER: 0.588235, loss: 119.248398
 - wav: file:///home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test_data_01/003/149/149_003_1440.wav
 - src: "애플 입장에선 이천 십 이 년 맥북 프로 레티나 버전 이후 처음으로 맥북 디자인을 확 바꿨다"
 - res: "입장에선 이천 십 이 년을 지난 것은 이 청장 한 하다"
--------------------------------------------------------------------------------
WER: 0.736842, CER: 0.606061, loss: 160.116226
 - wav: file:///home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test_data_01/003/118/118_003_0063.wav
 - src: "뉴스 일 중국 온라인 쇼핑몰 시장의 성장에 힘입어 택배 물동량이 크게 늘자 이에 따른 택배 쓰레기 문제가 조명되고 있다"
 - res: "뉴스 일 중국 온라인 이에 현장에 거 대통령이 회사 일당은 세대에게 보고 있다"
--------------------------------------------------------------------------------
WER: 0.666667, CER: 0.573333, loss: 375.439178
 - wav: file:///home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test_data_01/003/147/147_003_2999.wav
 - src: "국립 박물관의 연구원 테레자 본토르치히는 자신의 저서 믿음 때문에 투옥되다 아우슈비츠 강제 수용소의 여호와의 증인 에서 이렇게 썼습니다"
 - res: "국립 방문한 테레자 본토르치히는 자신의 저서 이들 때문에 때다 상태가 수 "
--------------------------------------------------------------------------------
WER: 0.666667, CER: 0.600000, loss: 758.892334
 - wav: file:///home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test_data_01/003/126/126_003_0376.wav
 - src: "반면 하종강 성공회대 노동아카데미 주임교수는 인간에 대한 이해 부족이 낳은 현상이라며 실직하는 노동자의 비인간적 고통을 생각해야 한다고 강조했다"
 - res: "반면 하종강 성공회대 노동아카데미 주임교수는 인간에 대해 "
--------------------------------------------------------------------------------
WER: 0.636364, CER: 0.422680, loss: 225.058609
 - wav: file:///home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test_data_01/003/104/104_003_0472.wav
 - src: "때로는 바람이 풀밭을 스치는 소리를 들으면서 그 소리와는 대조적으로 크고도 찢어지는 듯이 날카로운 쌍띠물떼새의 울음소리가 물통 근처에서 들려오는 것에 귀를 기울이곤 하였습니다"
 - res: "때로는 바람이 그치지 전에서 그 소리와는 대조적으로 크고도 지는 지시 가까운 사기 물에 오른 소리 물통 근처에서 주는 것에 귀를 지 용이하다"
--------------------------------------------------------------------------------
WER: 0.619048, CER: 0.512195, loss: 206.999420
 - wav: file:///home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test_data_01/003/118/118_003_0836.wav
 - src: "소망교회 관계자는 교회 세습을 막기 위해 김 담임목사가 부임했는데 이를 반대하고 교회를 어렵게 하는 사람들로 인해 송사가 끊이지 않고 있다고 말했다"
 - res: "소망교회 관계자는 교회 막기 진단과 보는 데 이 반대하고 결론이다 인해 송사가 끊이지 더 "
--------------------------------------------------------------------------------
WER: 0.615385, CER: 0.413043, loss: 105.757301
 - wav: file:///home/zhangp/tmp/deepspeech-venv/v-6-1/DeepSpeech-0.6.1/data/zeros-korean/test_data_01/003/118/118_003_0454.wav
 - src: "그런데 제가 보기에는 자신을 드러내기 위한 튀는 행보다 이렇게 볼 수 밖에 없습니다"
 - res: "그런데 제가 보기에는 자신을 드러내기 김을 해도 것이었습니다"
--------------------------------------------------------------------------------

lissyx · March 26, 2020, 10:18am

No, there are already plenty of advices on the forum. And proper training is highly dependant on your dataset, I can’t do it for you.

Have you read the documentation and other’s experience ? With 50 hours, you can’t expect anything, you need much more than that.

zhangpeng_K · March 26, 2020, 12:19pm

Thanks for you reply , I also know 50hs can not do nothing . but I just have the limited datasets ,Is there any other way to optimize my model? thanks!

Orion · May 19, 2020, 2:34am

I am very naive to deepspeech and deep learning concepts and experimenting with it. I am facing the same issue with Turkish language data sets available in common voice datasets provided by Mozilla. There is a section about Transfer Learning in the documents. You can try that as well. Surely End-to-end models require long hours. I have observed that run-ldc93s1.sh script fits on a relative small data quite well.
I think we need better understanding of deepspeech architecture. Or We might be missing some very important details about training on new language datasets.

As for suggestions: What I observe that you providing relatively large batch.

try to make it small. start with 1 as you have small dataset. and 4 upto 8.
try lesser n_hidden. start with 100.
may be start with learning rate of 0.0001. ( I think this is important. I am new DL so can’t say for sure what is it doing. You can look over the internet. )
observe 10 - 30 epochs. Try to observer over-fitting.

As for Deepspeech heroes, There should be at least one example for training on other language dataset in github repo or docs.

Don’t give up. Keep Learning!
Regards!