For German we are talking to almost 500K sentences. I would prefer if we can do something similar for Dutch outside the sentence collector.
My advise for Dutch would be:
- Make sure/help with at least the wikipedia process is finished to quickly get a lot of diverse sentences. Talk with @Fjoerfoks who is leading this effort.
- Wait until we see with @stergro how to handle the Europarl dataset so we can run a similar QA process with other languages.
Cheers.