Using the Europarl Dataset with sentences from speeches from the European Parliament

mkohler · December 12, 2019, 7:19pm

Which might still not be what we might want to show on Common Voice though. Even if it’s acceptable language, the context within a sentence might be heavily opinionated and I personally think Mozilla should refrain from displaying potentially weird political issues. Of course some will be submitted through the Sentence Collector. Do we know of any way to filter out some potentially more far left/right politicians from those datasets? (This is my opinion and I’m totally fine if y’all decide differently)

An example (and that could also be about a far left topic, just what came to mind here):

“All foreigners are …” is bad language, “All foreigners should be deported” is not per se bad language, but still might create a weird dissonance for people on Common Voice. I’m sure some assume that the sentences are vetted “by Mozilla” and therefore would associate Mozilla with these sentences.

Just my 2 cents