Thanks for bringing this up and creating a Discourse post instead of an issue.
I would suggest to have a look at the DE rules file, we spent quite some time improving it recently and I think it’s at a quite good state these days. It also used the best practices.
No, because we need to guarantee the legal constraints around this. The official script will be run once the rules file is merged, so no changes are possible.
And overall we want < 5% error rate, so overall quality should not be too bad. However there will be complicated sentences slipping through, but in most cases this will be outweighed by the benefit of having quite a lot of new sentences. This however can also be tweaked with the blocklist depending on how many occurrences you set as threshold. Needs quite some time to get right, but also worth it.