Sentence Collector Localization Update

Hi everyone

Back in 2019 there was an effort to get Sentence Collector to be localize-able. That effort died and now we have picked it up again. The initial set up in the project is ready. And a huge thanks goes out to @bozden who extracted all the English strings into the FTL file and adjusted all the usages to use the translation file. This was a massive effort from @bozden and helped to speed up the work tremendously. :tada:

We’re not done yet, so there currently is no work yet to actually translate into the different languages. We will eventually get there but before we need to still figure out a few things. Most notably we need to figure out how we’re gonna handle the interaction with Pontoon as Common Voice and the Sentence Collector are currently living in two different repositories.

You can see the remaining open tasks here:

Michael

7 Likes

Hey Micheal,

That’s soo cool to hear the sentence collector being localised. Congrats on the effort so far by yourselves and @bozden :sparkles:

So I asked one of the L10N team how we could do this and they suggested the following:

For firefox this has been possible, what they did was create a separate team e.g https://pontoon.mozilla.org/de/firefox-accounts/ for each feature. You can make different type but FTL will be ideal.

One important thing: both webpart and the collector codes must live under one repo, not two separate locations.

To give an example for CV, the repo is here: https://github.com/common-voice/common-voice/tree/main/web/locales. For the connector, it needs to be at least under the CV repo.

On the admin end for Pontoon I can make changes but will need to do this with Jenny just to ensure, it’s done correctly and doesn’t impact the platform.

There is some information that I would need from you, to add this on the admin end of Pontoon. I can follow up via email.

I hope this provides some clarification. If you have any questions I’m happy to ask L10N for you.

2 Likes

Thanks Hillary. To me these two blocks sound like two different options. Maybe I’m understanding this wrong though, so here’s what I understand.

Option 1

  • We set up a new project (analog to Common Voice) in Pontoon and link it with the Sentence Collector repository
  • The strings would live in the Sentence Collector repository
  • We enable all current languages from Common Voice in that new project as well
  • Every time a new language is enabled in Common Voice Pontoon, we should also enable it in the new project
  • With this approach we would not have any special setup in the repository and everything would be “standard”

Option 2

  • We integrate Sentence Collector into the Common Voice Pontoon project
  • The strings would live in the Common Voice repository
  • We need to sync the source strings somehow to the Common Voice repository so Pontoon knows about changes
  • Any translations would end up in the Common Voice repository, where they would not be used. We would need to regularly sync them to the Sentence Collector repository ourselves.
  • This would lead to quite some work to get the synchronization set up correctly, and would need extensive documentation as it’s not a setup new contributors would expect.

The reason why I think these are two different options is because if we have to have it in the same repository, there would be no benefit of having two different projects on Pontoon.

Even with the additional administrative process effort needed, I would probably still prefer Option 1 as this is way less complex in my opinion. Eventually we can talk again about integrating the Sentence Collector into Common Voice itself, which would then eventually lead to only one project on Pontoon. But that’s nothing that is gonna happen any time soon.

Am I understanding this correctly? When in doubt, feel free to point somebody from the l10n team in my direction :slight_smile:

Thanks!
Michael

1 Like

Michael, thank you for the summary. I vote for Option 1, having both under one project. This way, contributors would have one click to access to both types of content, not two separate ones. Like you said, you enable the language once in Pontoon, and you don’t have to juggle between two projects to make sure they have identical locale lists.

@pmo I think what you describe would be Option 2. Could you double check? :slight_smile:

I meant Option 2. Thanks for calling out. Would it be a significant dev work to streamline the process now? It might be lots of work upfront, but downstream, it is much easier to manage. I don’t understand the complexity of syncing issue you raised here.

Thanks for the input. I quickly talked to Jenny and she would prefer Option 2. She also reminded me that we don’t need to keep the translation files in sync in this repo, and we can fetch them when building and pushing the deployment. This means that there is less work in terms of synchronization, now I agree with Option 2 as well. So we’re gonna implement this in the same Pontoon project as the existing Common Voice strings. This will require some work on the Sentence Collector side, for which I will file separate issues. In terms of Pontoon itself there is nothing to be done.

1 Like

Update

By now most of the underlying tasks have been completed. In a few hours you will see the Sentence Collector strings pop up to be localized in Pontoon.

We are not exposing a language switch dropdown yet, but we will do so soon depending on the progress of the translations.

Remaining tasks: https://github.com/common-voice/sentence-collector/labels/localization

Michael

3 Likes

That’s awesome!

Eagerly waiting :grinning:

@mkohler, is “Sentence Collector” a brand name, or can it be translated?

That’s a very good question, hadn’t thought about that yet. While many resources will eventually be localized I think some will not be and therefore will be referencing to Sentence Collector in English. This for example includes Discourse posts. Treating it as a brand name might be the least confusing option here. I don’t have a strong opinion here though. What do you think?

Well, “Cümle Toplayıcı” has a good ring in it and means the same. If people write it with capital initials as I did, it will be a localized brand name…

One important thing for translators thou:

There are examples in the original English text which refer to English. These must be replaced with localized counterparts, not with direct translations. Otherwise they will not mean anything.

Examples:

  • For example, the acronym “ICE” could be pronounced “I-C-E” or as a single word.
  • For example, an apostrophe is included in English words like “don’t” and “we’re” and should be included in the source text, but it’s unlikely you’ll ever need a special symbol like “@” or “#.”

I’ve just started on the new Common Voice Strings related, I guess to the Sentence Collecor and I’ve come across this:
Home
COMMENT Don’t rename the following section, its contents are auto-inserted based on the name. These strings are automatically exported from Sentence Collector. [SentenceCollector]
Can you clarify this? What does rename and section mean? If they are not to be translated why are they included in Pontoon? Thanks.

1 Like

@rprys, I think you are looking at the repo. You should do the translation through Pontoon. That part in the repo will be filled programmatically…

No, I’m in Pontoon…

https://pontoon.mozilla.org/cy/common-voice/all-resources/?status=missing&string=235053

1 Like

@rprys thanks for reporting this. I will have a look. This was not meant as a comment on that specific string, you can safely ignore it.

1 Like

One more thing to be aware of - a minor issue… If the original sentence has a variable AND it is a correct English word ( e.g. {$sentences} ) AND if you use Google translation to start with, you get that variable also translated. Pontoon will give an error, stating there is no closing “}”. To pass that, you should correct the variable name of course…

I’m not sure what ‘total sentences’ mean. The number of sentences or if they are complete?

0 No total sentences.
one 1 total sentence.
other { $totalSentences } total sentences.

GROUP COMMENT Validation criteria

CONTEXT sc-lang-info-total

RESOURCE Common Voiceweb/locales/en/messages.ftl

https://pontoon.mozilla.org/cy/common-voice/all-resources/?status=missing&string=235049