Sentence collection for Belarusian

Adrijaned · July 7, 2020, 1:30pm

If the sentences from anibel.net are generally no good, and you are unsure about licensing, I’d just post in Sentence collector copyright issues and have them deleted straight away completely; if they are generally more or less grammatically fine and acceptable, I’d try to inquire about the licensing a bit more, and only if CC0 can’t be confirmed, I’d have them deleted.

Having archaic and dialectal words in the dataset is not an issue IMO as far as most people are capable of pronouncing them fine - I for one am using such words a lot in my common speach, and so would find a STT ML model not recognizing them… lacking, to say the least.

Only 10k sentences being loaded is technical limitation of Kinto, and while it has its issues, its also for the better - if we take a wild guess and say one sentence may equal to about 100 bytes of JSON, loading all 100K sentences would equal to downloading ~10MB of data on each page load - acceptable, although not great on desktop PCs, much worse on mobile devices connected to mobile networks, for example when you want to review a few sentences while riding a train.

Editing already validated sentences is definitely possible, but should be done only by review by at least one other native speaker, and preferably in an official capacity.

Topic		Replies	Views
Sentence collection for Belarusian – request for advice Common Voice sentence-collection	16	1148	July 9, 2021
Polish sentences concerns Common Voice sentence-collection , issue , dataset	20	3273	May 4, 2020
We want your feedback: Improving the sentence collection Common Voice sentence-collection , feedback	39	8869	January 9, 2019
Sentence collection tool development topic Common Voice sentence-collection , announcements	32	4024	January 26, 2019
Remove all sentences in sentence collector for Ukrainian Common Voice sentence-collection	19	1095	December 27, 2019

Sentence collection for Belarusian

Related topics