Training Traditional Chinese for Common Voice using Deep Speech

Hi, I try to train Taiwan Chinese speech recognition using common voice dataset, I already finished the training and the loss is around 55 using this only common voice dataset. But for testing it taking really - really long time. I think that I did something wrong for generate the alphabet for Chinese resulting very large alphabet. I need your help:

  1. Could anyone provide step by step to generate alphabet in the correct way for Chinese? I read about UTF-8 in Deep Speech documentation but could not really understand it.

  2. Do we need to create language model to train Chinese Speech Recognition? If yes, how you generate the language model?

  3. I prefer to use Taiwanese datasets from common voice, if you have any pretrained model in Chinese it will really help me maybe I could do the transfer learning for train Taiwanese Dataset.

Thank you and sorry for the newbie questions. I am really stuck in this point now.

That’s kind of on purpose, this is really experimental until @reuben finishes some things (which are in progress as we speak), so there is few doc.

What you highlight is expected if you use alphabet with mandarin and similar languages

Yes. Please refer to the documentation, external scorer is covered.

We don’t have that yet.

I recommend waiting for the upcoming 0.9 release which should make things clearer/easier.

Thank you @lissyx and @reuben, I will wait for the upcoming 0.9 release then. I already trained using Taiwanese Common Voice Dataset got loss around 55 - 57 in 20 epochs (this dataset I know is too small). When I tried to do testing and inference, it taking really long - long time and output nothing, I believe this is not because of the datasets are too small, but I believe it also because of too large alphabet I generate in Chinese that consist of more than 2000 characters.

I am glad that it will continue to 0.9 release, what time the estimation of that version will come? btw, thank you very much for your all nice helps.