Hey @Razmik-Badalyan, I hope everything goes well. I have no idea about the capability of the backbone, but I remember we had problems on the last global campaign two years before, but it was Amazon AWS.
But I can think of one additional possible problem:
Common Voice uses a caching mechanism (Redis) to feed the sentences and it is not randomized after it has this cache. I have several issues reported regarding validation (see 1, 2). At the time of posting those issues I was thinking that the culprit is in dataset queries, but after checking, I recognized it is because of the implemented “lazyCache”.
It works like this:
- The system caches N sentences (say max 50) every T seconds (say 60 sec = 1 minute) and feeds them to the users as it is when they visit the website, the data is loaded into the browser and used from there until the whole data is used, then a new set is requested.
- Say, at that minute 100 people start to record, they will get the same sentences in the same order, and say recording a sentence takes 10 secs, so they each record 6 sentences, resulting the same 6 sentences recorded 100 times.
I’m not sure of this mechanism is valid for recording (related code is too much distributed to follow quickly), but I know it is valid for validation. Perhaps the team can shed some light on this (@jesslynnrose).
I any case, I think distributing the load over time will be best, perhaps a “Language Week”?
Some more thoughts
Validation: You should also think of validating the sentences. If 50.000 people record 100 sentences on the average, you will have 5 million recordings to validate. On the minimum (5 sentence batch) you will have 250k. It will take many man-hours to validate them if it is done by few people (min 1000, max 25.000 man-hours).
Maybe saying “record N, validate 2*N” is a better goal for the campaign.
Text-corpus: I can see that you have about 50k unrecorded sentences (see the text corpus tab here), so these also will be consumed rapidly and recorded many times. Also the existing text-corpus of ~77k will be re-recorded many more times.
Maybe some should also write sentences and validate them?
Support: People would need support, have questions, you would need to connect to them someway in case something is not right. So you need a medium, support channel, possibly IM like Telegram.
These are just some ideas…
I really hope everything goes well, I’m happy for you (and feel envy ).