Time needed after adding sentences to the sentence collector to start using them?


We added and reviewed about 5k new sentences for the Arabic language. But the commonvoice speaking section (https://commonvoice.mozilla.org/ar/speak ) as well as the listening one, are still saying that there are not sentences to show. So our sentences are still not added to be spoken.

How long does it take from adding sentences to the sentence collector and starting to use them (speak them)?

We have a contest for students and the deadline is mid-December, so if this process can be accelerated we will highly appreciate that.

https://github.com/mozilla/common-voice/blob/main/server/data/ar/sentence-collector.txt indeed contains more than 5k sentences, and it has also been enabled for contribution since mid of 2019: https://github.com/mozilla/common-voice/blob/main/locales/contributable.json#L3. Therefore my guess would be that there are no new sentences available for recording, as all the existing sentences have already been recorded. See Single Sentence Record Limit feature release.

We have launched a contest for students to help on voice collection. Three hours after launching the contest, all the sentences have been consumed. Is there a way to add new sentences quickly?

We have used the Sentence collector (https://commonvoice.mozilla.org/sentence-collector/#/add) and added/reviewed a lot of sentences (currently there are 26434 validated sentences). How can we add the sentences that were reviewed in the Sentence Collector?

The contest is only running until December 15’th, if you can help unlocking us we will highly appreciate it.

I see on github that there is an automatic sentence collector export, what is the frequency of running this automatic export?

I have just figured out how to do a pull request (by following the steps in https://github.com/common-voice/sentence-collector#exporting-to-the-official-repository) and I sent the following pull request for all the validated sentences (https://github.com/mozilla/common-voice/pull/2943/files).

The PR contains updates for all the languages. If you don’t want that, please feel free to delete the files of all the other languages (I’m mainly interested in arabic).

Please let me know if you have any comment.

It would be really helpful if you can approve the PR soon.

For anyone reading this in the future, I’ll answer the question and will also repeat what I said on the PR:

The automatic export runs every Friday. However having the sentences in the common-voice repository is not enough. Sentences only get deployed when the Common Voice website is deployed.

Hi @mkohler,

The sentences are still not available on the Common Voice website (as you can see in https://commonvoice.mozilla.org/ar/speak). Is it just because Common Voice is still not yet deployed? Or maybe another problem?

Who should I ask to figure out what is the problem (or when it will be deployed)?

How did you check that? When going to that page, I could record sentences, so there are at least some not-already-recorded sentences there.

This is strange, because when I visit any of the following pages:

It says that there are no sentences to record or listen to, as shown in the screenshots here: https://drive.google.com/file/d/1kxXhqUHYLaxgRyHzz60b_tVbZwJaArx4/view?usp=sharing

I have refreshed my browser many times, tried another browser and tried both browsers in private/incognito mode to make sure it is not a cookie problem. I have also tried to connect to an MIT VPN network (my university), still the same problem.

Can you please double check that it is working on your end?

If it is working on your end, do you have any idea why this is case?

Wait, nevermind, I’m just stupid. @phirework do you have any idea why there are no sentences to record? On November 30th there were more than 20k new sentences added to the common-voice repo, which in theory should have been deployed with Release 38?

I’ll look into it - there were some hiccups during the last release because of the dataset issue and it’s possible the imports weren’t able to finish running. Hang tight!

Hi @phirework,

Thanks. Actually we have a competition to students to participate in recording voice and the deadline is December 15’th (the 18’th is the UN day for the Arabic language). So it would be great if you can do something as students have been waiting for the last 10 days.

I’ve kicked off a new sentence importing process and it looks like ar is ready to contribute! Sorry about that.

Great! Thanks @phirework and @mkohler for your big help on this!

Another question, the graph at the end of the page https://commonvoice.mozilla.org/ar shows that there are 27h recorded and 14h reviewed. Why do we have this gap? (13h are recorded but not available for review). Is there another process that needs to be run to make all the recorded hours available for review?

I’m not clear what you’re referring to - the graph for ar shows 84 hours recorded and 41 hours validated, and if you go to https://commonvoice.mozilla.org/ar/listen clips show up to be validated.

