Getting res="" when training common voice tamil data

tusharmajhi9 · March 6, 2020, 2:57pm

Finally! Getting some result. I did the complete training from scratch again.
The issue seemed to be some special characters in my vocabulary file. Really appreciate all the help @lissyx and @othiele! Thank you!

Test on /data/ta/clips/test.csv - WER: 0.999333, CER: 0.810361, loss: 92.791626
--------------------------------------------------------------------------------
  WER: 1.000000, CER: 0.833333, loss: 22.540857
- wav: file:///data/ta/clips/common_voice_ta_19760931.wav
- src: "வெயில்"
- res: "அவன்"
--------------------------------------------------------------------------------
  WER: 1.000000, CER: 0.833333, loss: 27.824286
- wav: file:///data/ta/clips/common_voice_ta_19340193.wav
- src: "இயற்கை"
- res: "என்ன"
--------------------------------------------------------------------------------
  WER: 1.000000, CER: 0.857143, loss: 29.643803
- wav: file:///data/ta/clips/common_voice_ta_19422346.wav
- src: "உழைப்பு"
- res: "அவன்"
--------------------------------------------------------------------------------
  WER: 1.000000, CER: 0.818182, loss: 33.205898
- wav: file:///data/ta/clips/common_voice_ta_19385677.wav
- src: "காடு பள்ளம்"
- res: "மக்கள்"
--------------------------------------------------------------------------------
  WER: 1.000000, CER: 0.818182, loss: 33.705170
- wav: file:///data/ta/clips/common_voice_ta_19340349.wav
- src: "மிக்க நன்றி"
- res: "நான்"
--------------------------------------------------------------------------------
  WER: 1.000000, CER: 0.916667, loss: 34.370171
- wav: file:///data/ta/clips/common_voice_ta_19340218.wav
- src: "அந்திய காலம்"
- res: "குப்பன்"
--------------------------------------------------------------------------------
  WER: 1.000000, CER: 0.666667, loss: 35.381241
- wav: file:///data/ta/clips/common_voice_ta_19195633.wav
- src: "இளமையில் கல்"
- res: "மக்கள்"
--------------------------------------------------------------------------------
  WER: 1.000000, CER: 0.769231, loss: 36.430870
- wav: file:///data/ta/clips/common_voice_ta_19423203.wav
- src: "பீடு பெற நில்"
- res: "குப்பன்"
--------------------------------------------------------------------------------
  WER: 1.000000, CER: 0.750000, loss: 36.771549
- wav: file:///data/ta/clips/common_voice_ta_19140215.wav
- src: "மோகத்தை முனி"
- res: "மக்கள்"
--------------------------------------------------------------------------------
  WER: 1.000000, CER: 0.916667, loss: 37.200134
- wav: file:///data/ta/clips/common_voice_ta_19422359.wav
- src: "ஒரே சிரிப்பு"
- res: "அவன்"
--------------------------------------------------------------------------------

lissyx · March 6, 2020, 3:03pm

Thanks for the feedback. To make your life easier and share efforts, please consider joining forces by re-using https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train like Italian community does: https://github.com/MozillaItalia/DeepSpeech-Italian-Model