Sentence Collector Localization Update

mkohler · October 31, 2021, 4:47pm

Hi everyone

Back in 2019 there was an effort to get Sentence Collector to be localize-able. That effort died and now we have picked it up again. The initial set up in the project is ready. And a huge thanks goes out to @bozden who extracted all the English strings into the FTL file and adjusted all the usages to use the translation file. This was a massive effort from @bozden and helped to speed up the work tremendously.

We’re not done yet, so there currently is no work yet to actually translate into the different languages. We will eventually get there but before we need to still figure out a few things. Most notably we need to figure out how we’re gonna handle the interaction with Pontoon as Common Voice and the Sentence Collector are currently living in two different repositories.

You can see the remaining open tasks here:

Michael

heyhillary · November 3, 2021, 4:50pm

Hey Micheal,

That’s soo cool to hear the sentence collector being localised. Congrats on the effort so far by yourselves and @bozden

So I asked one of the L10N team how we could do this and they suggested the following:

For firefox this has been possible, what they did was create a separate team e.g https://pontoon.mozilla.org/de/firefox-accounts/ for each feature. You can make different type but FTL will be ideal.

One important thing: both webpart and the collector codes must live under one repo, not two separate locations.

To give an example for CV, the repo is here: https://github.com/common-voice/common-voice/tree/main/web/locales. For the connector, it needs to be at least under the CV repo.

On the admin end for Pontoon I can make changes but will need to do this with Jenny just to ensure, it’s done correctly and doesn’t impact the platform.

There is some information that I would need from you, to add this on the admin end of Pontoon. I can follow up via email.

I hope this provides some clarification. If you have any questions I’m happy to ask L10N for you.

mkohler · November 3, 2021, 8:11pm

Thanks Hillary. To me these two blocks sound like two different options. Maybe I’m understanding this wrong though, so here’s what I understand.

Option 1

We set up a new project (analog to Common Voice) in Pontoon and link it with the Sentence Collector repository
The strings would live in the Sentence Collector repository
We enable all current languages from Common Voice in that new project as well
Every time a new language is enabled in Common Voice Pontoon, we should also enable it in the new project
With this approach we would not have any special setup in the repository and everything would be “standard”

Option 2

We integrate Sentence Collector into the Common Voice Pontoon project
The strings would live in the Common Voice repository
We need to sync the source strings somehow to the Common Voice repository so Pontoon knows about changes
Any translations would end up in the Common Voice repository, where they would not be used. We would need to regularly sync them to the Sentence Collector repository ourselves.
This would lead to quite some work to get the synchronization set up correctly, and would need extensive documentation as it’s not a setup new contributors would expect.

The reason why I think these are two different options is because if we have to have it in the same repository, there would be no benefit of having two different projects on Pontoon.

Even with the additional administrative process effort needed, I would probably still prefer Option 1 as this is way less complex in my opinion. Eventually we can talk again about integrating the Sentence Collector into Common Voice itself, which would then eventually lead to only one project on Pontoon. But that’s nothing that is gonna happen any time soon.

Am I understanding this correctly? When in doubt, feel free to point somebody from the l10n team in my direction

Thanks!
Michael

pmo · November 4, 2021, 5:40pm

Michael, thank you for the summary. I vote for Option 1, having both under one project. This way, contributors would have one click to access to both types of content, not two separate ones. Like you said, you enable the language once in Pontoon, and you don’t have to juggle between two projects to make sure they have identical locale lists.

mkohler · November 4, 2021, 9:10pm

@pmo I think what you describe would be Option 2. Could you double check?

pmo · November 5, 2021, 12:48am

I meant Option 2. Thanks for calling out. Would it be a significant dev work to streamline the process now? It might be lots of work upfront, but downstream, it is much easier to manage. I don’t understand the complexity of syncing issue you raised here.

mkohler · November 5, 2021, 11:28pm

Thanks for the input. I quickly talked to Jenny and she would prefer Option 2. She also reminded me that we don’t need to keep the translation files in sync in this repo, and we can fetch them when building and pushing the deployment. This means that there is less work in terms of synchronization, now I agree with Option 2 as well. So we’re gonna implement this in the same Pontoon project as the existing Common Voice strings. This will require some work on the Sentence Collector side, for which I will file separate issues. In terms of Pontoon itself there is nothing to be done.

mkohler · November 16, 2021, 8:20pm

Update

By now most of the underlying tasks have been completed. In a few hours you will see the Sentence Collector strings pop up to be localized in Pontoon.

We are not exposing a language switch dropdown yet, but we will do so soon depending on the progress of the translations.

Remaining tasks: https://github.com/common-voice/sentence-collector/labels/localization

Michael

bozden · November 16, 2021, 9:00pm

That’s awesome!

Eagerly waiting

bozden · November 17, 2021, 1:24pm

@mkohler, is “Sentence Collector” a brand name, or can it be translated?

mkohler · November 17, 2021, 6:49pm

That’s a very good question, hadn’t thought about that yet. While many resources will eventually be localized I think some will not be and therefore will be referencing to Sentence Collector in English. This for example includes Discourse posts. Treating it as a brand name might be the least confusing option here. I don’t have a strong opinion here though. What do you think?

bozden · November 17, 2021, 9:36pm

Well, “Cümle Toplayıcı” has a good ring in it and means the same. If people write it with capital initials as I did, it will be a localized brand name…

bozden · November 17, 2021, 10:06pm

One important thing for translators thou:

There are examples in the original English text which refer to English. These must be replaced with localized counterparts, not with direct translations. Otherwise they will not mean anything.

Examples:

For example, the acronym “ICE” could be pronounced “I-C-E” or as a single word.
For example, an apostrophe is included in English words like “don’t” and “we’re” and should be included in the source text, but it’s unlikely you’ll ever need a special symbol like “@” or “#.”

rprys · November 18, 2021, 9:48am

I’ve just started on the new Common Voice Strings related, I guess to the Sentence Collecor and I’ve come across this:
Home
COMMENT Don’t rename the following section, its contents are auto-inserted based on the name. These strings are automatically exported from Sentence Collector. [SentenceCollector]
Can you clarify this? What does rename and section mean? If they are not to be translated why are they included in Pontoon? Thanks.

bozden · November 18, 2021, 9:58am

@rprys, I think you are looking at the repo. You should do the translation through Pontoon. That part in the repo will be filled programmatically…

rprys · November 18, 2021, 10:00am

No, I’m in Pontoon…

rprys · November 18, 2021, 10:01am

https://pontoon.mozilla.org/cy/common-voice/all-resources/?status=missing&string=235053

mkohler · November 18, 2021, 10:09am

@rprys thanks for reporting this. I will have a look. This was not meant as a comment on that specific string, you can safely ignore it.

bozden · November 18, 2021, 1:19pm

One more thing to be aware of - a minor issue… If the original sentence has a variable AND it is a correct English word ( e.g. {$sentences} ) AND if you use Google translation to start with, you get that variable also translated. Pontoon will give an error, stating there is no closing “}”. To pass that, you should correct the variable name of course…

rprys · November 18, 2021, 1:48pm

I’m not sure what ‘total sentences’ mean. The number of sentences or if they are complete?

0	No total sentences.
one	1 total sentence.
other	{ $totalSentences } total sentences.

GROUP COMMENT Validation criteria

CONTEXT sc-lang-info-total

RESOURCE Common Voice•web/locales/en/messages.ftl

https://pontoon.mozilla.org/cy/common-voice/all-resources/?status=missing&string=235049