May I ask which Chinese wiki pages? For zh-CN a wiki extraction has already been done, so we can’t redo that: common-voice/server/data/zh-CN/wiki.zh-cn.txt at master · common-voice/common-voice · GitHub
Maybe, maybe not, I can’t say right now. We’re using the punkt sentence tokenizer, I do not know off hand if that supports Chinese punctuation. Given that the (different) extractor that was used for that export is not using punkt, I wouldn’t be surprised if not: cv-sentence-extractor/src/extractor.rs at mandarin · common-voice/cv-sentence-extractor · GitHub
I wouldn’t know of any issue, but of course can’t guarantee it’s bug-free. ![]()