Hi @danishka, thank you for your interest.
We plan to start localizing the site into multiple languages very soon, and part of that will be figuring out the answers to your questions here. But as a first step, collecting a large set of public domain Sinhala sentences would help us kick start the process. I’ll try to answer the rest of your questions below.
What kind of information you need from our side?
Nothing just yet
What is the minimum size of Sinhala text required to extract sentences?
I think we need at least of couple thousand sentences to get started. But ideally we have several hundred thousand, or even a million or more.
What are the recommended recording environments and how many samples required per sentence?
Recording at home, or on the go through the website and app should be all the environments we need. We want a large variety, nosy/quiet, computer/phone, home/out, all are good!
Is there a recommended guide line?
Not yet, but part of the process for localizing the site will be coming up with this guideline. Stay tuned!