Here’s the situation I encountered while reviewing Greek sentences: it seems that the ones in there are fragments from a number of books. I couldn’t recognize the source of the ones I got just now (“Ας πούμε το άλφα είναι το πιο τυχερό”, “Μπορεί να ζευγαρώσει με ένα σωρό άλλα γραμματάκια”, “και σύμφωνα και φωνήεντα”) but it’s obvious that it’s a sentence from some existing text cut in three pieces at punctuation boundaries. This creates pieces of grammatically correct but incomplete language and, of course, not sentences that would appear in a regular speech corpus.
I found the source of couple of other sentences I got: it’s from the book “Παραμύθι χωρίς όνομα” which seems to be in the public domain (the author died in 1941) but it suffers from the same problem.
I’m not sure it makes sense to continue reviewing Greek sentences if the whole set is like this… I could just downvote everything that’s not a complete sentence, but even those seem ill-fit for the purpose and, if we can’t recognize the source, there could be copyright problems with them.