Common Voice datasets (Mandarin zh-tw)

areyliu6 · May 22, 2019, 10:31am

I’ve tried to use Common Voice datasets on DeepSpeech, I’m wondering to know why the amount of the train/dev/test dataset is almost 1: 1: 1?

And the training datasets didn’t cover all the alphabet (some character in dev/test dataset is not show in the training dataset ), it may cause the validating loss can not decrease as expected. (Maybe I’m wrong, just for guessing )

irvin · May 22, 2019, 9:43am

Did your question been covered in the following threads?

areyliu6 · May 23, 2019, 3:26am

Yes, it’s what I’m asking!
Thanks!