Basque dataset ready

Can you coordinate with @mkohler to test all your sentences through the sentence collector? We should make sure the PR doesn’t have these invalid sentences.

/cc @gregor