I still agree that this is a good idea, but that will require some exploration on how we can achieve that. I will create a new thread specifically for that. Let’s focus on the original topic here as that is still important, no matter if we have a common rules description or not.
Edit: done here: Common rule files for Sentence Collector / Sentence Extractor
I disagree. If we keep it in your PR as it is now, there is a chance other contributors will look at it, figure they need to do the same and invest a lot of (for now) unnecessary time and it can also lead to confusion. Now your argument might be “that will also be helpful in the future”. Maybe, but as long as we do not know the data format and how exactly it will be applied, this is just a guess.
That being said, I do not want to lose work you’ve already done either. Therefore, if you add a comment on the top of the file that commented-out lines are currently not necessary but might be useful for the future and comment out all the lines I mentioned in the PR, then I’d be fine with merging this PR.
Do you agree with this approach?