Why train.tsv includes a few files (just 3% of validated set)?

It is my current understanding that the train/dev/test sets are completely re-generated each release with no guarantee that the previous split data will be reflected so I would caution against using the released splits as an academic source. See this thread: Dataset split best practices?

1 Like