Using the Europarl Dataset with sentences from speeches from the European Parliament

nukeador · December 26, 2019, 12:32pm

For German we are talking to almost 500K sentences. I would prefer if we can do something similar for Dutch outside the sentence collector.

My advise for Dutch would be:

Make sure/help with at least the wikipedia process is finished to quickly get a lot of diverse sentences. Talk with @Fjoerfoks who is leading this effort.
Wait until we see with @stergro how to handle the Europarl dataset so we can run a similar QA process with other languages.

Cheers.

Topic		Replies	Views
Europarl Datensatz mit hunderttausenden Sätzen aus EU-Debatten Deutsch (de)	8	1004	December 23, 2019
Polish dataset from Europarl - help needed Common Voice	14	1194	July 17, 2021
Mithilfe benötigt für Massenimport aus dem Datensatz des Europäischen Parlamentes Deutsch (de)	7	1169	March 6, 2020
Polish dataset download Common Voice dataset	49	4694	April 13, 2020
Question about CV Sentence Extractor quality and your experience Common Voice	18	1571	August 30, 2023