Using the Europarl Dataset with sentences from speeches from the European Parliament

For German we are talking to almost 500K sentences. I would prefer if we can do something similar for Dutch outside the sentence collector.

My advise for Dutch would be:

  1. Make sure/help with at least the wikipedia process is finished to quickly get a lot of diverse sentences. Talk with @Fjoerfoks who is leading this effort.
  2. Wait until we see with @stergro how to handle the Europarl dataset so we can run a similar QA process with other languages.

Cheers.