Yes, once we improve our wikipedia extractor script and automate its process we want to start exploring how to use the learning to build something to parse large public domain corpus and split them into sentences that follow our established rules for each language.
1 Like