Hi, I try to train Taiwan Chinese speech recognition using common voice dataset, I already finished the training and the loss is around 55 using this only common voice dataset. But for testing it taking really - really long time. I think that I did something wrong for generate the alphabet for Chinese resulting very large alphabet. I need your help:
Could anyone provide step by step to generate alphabet in the correct way for Chinese? I read about UTF-8 in Deep Speech documentation but could not really understand it.
Do we need to create language model to train Chinese Speech Recognition? If yes, how you generate the language model?
I prefer to use Taiwanese datasets from common voice, if you have any pretrained model in Chinese it will really help me maybe I could do the transfer learning for train Taiwanese Dataset.
Thank you and sorry for the newbie questions. I am really stuck in this point now.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
That’s kind of on purpose, this is really experimental until @reuben finishes some things (which are in progress as we speak), so there is few doc.
What you highlight is expected if you use alphabet with mandarin and similar languages
Yes. Please refer to the documentation, external scorer is covered.
Thank you @lissyx and @reuben, I will wait for the upcoming 0.9 release then. I already trained using Taiwanese Common Voice Dataset got loss around 55 - 57 in 20 epochs (this dataset I know is too small). When I tried to do testing and inference, it taking really long - long time and output nothing, I believe this is not because of the datasets are too small, but I believe it also because of too large alphabet I generate in Chinese that consist of more than 2000 characters.
I am glad that it will continue to 0.9 release, what time the estimation of that version will come? btw, thank you very much for your all nice helps.
Hi, since v0.9.1 already released now I tried to train using --bytes_output_mode using common voice Taiwanese datasets. Training going well but the losses still more than 100, I also used the recommended setting for lm_alpha and lm_beta. Here is how I am training:
How do you change decode in training process? because after epochs end, it will go to testing if we specify the data test. I got some error when it goes to data test.
change ds_ctcdecoder init.py decode function
def Decode(self, input):
‘’‘Decode a sequence of labels into a string.’’’
res = super(UTF8Alphabet, self).Decode(input)
return res.decode(‘utf-8’,‘ignore’)
Hi, I already did like your suggestion and get my .pbmm model, but when I tried to do inference, it takes really long time and outputs nothing. I trained using Taiwanese datasets from the common voice.
Do you face this kind of problem too? I am sure that dataset maybe is not enough to produce a good result but at least it could output some result right?
I am sorry, that was actually just for example, I already tried for training for around 15 epochs although the loss still decreases a little bit (maybe this is still acceptable if my model still output nothing). But, inference still taking a long time, but if I tried the inference with deepspeech-0.9.1-models-zh-CN.pbmm it not taking that long.
How about the inference time? is it also because the dataset still not enough, or I still have something missing to use byte output mode?
Loading model from file model_result/output_graph.pbmm
Running inference.
^[[A
Inference took 323.130s for 3.552s audio file.
Here is the inference result using deepspeech-0.9.1-models-zh-CN.pbmm
Loading model from file model_result/deepspeech-0.9.1-models-zh-CN.pbmm
Loaded model in 0.00576s.
Running inference.
在孫区年有逺叿望的
Inference took 6.231s for 3.552s audio file.
Label:
台北去年有破十萬人
I am sorry did not providing much detail before, thank you very much for your help.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
18
Without even sharing the command line used to run the inference, nor the model sizes, I have no idea. Not even to mention that @reuben worked on that part, not me.