I would like to contribute to the Hungarian voices, but when I try it, it seems to be only in the Sentence Collection phase. How can I contribute to the Sentence Collection? I guess translating 1000-2000 English sentences to Hungarian will take no time, and then voice collection could start.
Unfortunately I could not find any information about it on the site.
Thanks,
Andrew
nukeador
(Rubén Martín [❌ taking a break from Mozilla])
2
Hello and welcome to the community!
Please check our pinned readme and let us know if you have additional questions:
Hi András,
We are past 4300 validated sentences, only less than 700 is missing until we have enough to start the voice collection. Please, go and validate some of the sentences, by logging into the Sentence Collector portal here: https://common-voice.github.io/sentence-collector/?#/
Thanks!
nukeador
(Rubén Martín [❌ taking a break from Mozilla])
4
A quick recommendation: Please explore the possibility to do the sentence extraction process for Hungarian to get as many sentences as possible sooner.
Sentence collector is only recommended if you have already done this and the EuroParl or other CC-0 big sources first, and you want to incorporate more sentence diversity. Note that the manual sentence collection is a slow process that takes some time and you will run out of sentences to record really soon with just the initial 5000.
And to put into perspective what “really soon” means - after collecting the initial 5000 for czech for like a year, the sentences were gone through within a week after launch, iirc. And all it needed was just a launch announcement on several czech sites.
I noticed that the Sentence collector has more than 5k sentences now. Is it possible to add those while the wikipedia extractor PR is in progress? Based on my experience the Sentence collector has much higher quality data than the Wiki extractor ATM.