Sentence collection for Belarusian

If the sentences from anibel.net are generally no good, and you are unsure about licensing, I’d just post in Sentence collector copyright issues and have them deleted straight away completely; if they are generally more or less grammatically fine and acceptable, I’d try to inquire about the licensing a bit more, and only if CC0 can’t be confirmed, I’d have them deleted.

Having archaic and dialectal words in the dataset is not an issue IMO as far as most people are capable of pronouncing them fine - I for one am using such words a lot in my common speach, and so would find a STT ML model not recognizing them… lacking, to say the least.

Only 10k sentences being loaded is technical limitation of Kinto, and while it has its issues, its also for the better - if we take a wild guess and say one sentence may equal to about 100 bytes of JSON, loading all 100K sentences would equal to downloading ~10MB of data on each page load - acceptable, although not great on desktop PCs, much worse on mobile devices connected to mobile networks, for example when you want to review a few sentences while riding a train.

Editing already validated sentences is definitely possible, but should be done only by review by at least one other native speaker, and preferably in an official capacity.