I’ve tried to use Common Voice datasets on DeepSpeech, I’m wondering to know why the amount of the train/dev/test dataset is almost 1: 1: 1?
And the training datasets didn’t cover all the alphabet (some character in dev/test dataset is not show in the training dataset ), it may cause the validating loss can not decrease as expected. (Maybe I’m wrong, just for guessing )