Pretranslation in Pontoon: looking for 3 locales to opt in for Beta testing (April-June 2023)

It’s been almost 3 years since we started working on the foundation layers to support pretranslation in Pontoon, and we’ve finally reached a point where we can safely include a limited number of locales in a Beta testing phase.

What is pretranslation?

When a new string is added to a project:

  • It will be translated (pretranslated) using a 100% match from translation memory or, should that not be available, using the Google AutoML Translation engine with a custom model.
  • The string will be stored in Pontoon as pretranslated. The pretranslated status shows up in the dashboards, and can be used in filters within the translation interface.
  • The string will also be saved in the repository (e.g. GitHub), and eventually ship in the product.

Pretranslation can be enabled for a set of locales in any project, and the list of locales can differ between projects.

Why pretranslation?

The amount of content to translate is constantly growing, with overlapping and often demanding deadlines.

We want to support the community of volunteers by making their life easier, in particular when it comes to approving translations that have already been approved before, but also by bootstrapping translation for brand new content.

Why Google AutoML Translation?

We selected Google AutoML Translation based on several criteria, including reliability and quality of results. In particular, Google AutoML Translation allows us to fine tune the translation engine, by training it on our own existing translation memories. Note that locales that opt in to use pretranslation will also benefit from the custom machine translation engine in the Machinery panel.

Was there an Alpha testing phase?

Yes! We’ve been testing pretranslation with 2 locales — Italian and Slovenian — for the past 4 months. We picked these locales because we had staff support to review translations, and fixed several bugs in the process.

We have tested the feature on 5 different projects so far (Firefox Accounts, Thunderbird, Monitor, VPN, Focus for Android), to cover different file formats, for a total of 1035 strings.

The results[1], especially when accounting for the bugs we fixed over time, have been really promising:

  • 65.10 % pretranslated strings were approved without changes.
  • 94.61 % were manually reviewed as “usable”.

The average chF++ score — the algorithm we are using to evaluate translations — has been 92.97 (the closer to 100, the better).

[1] Data has been revised after this post was first published. During the Beta testing we discovered an error in the way data was calculated, and applied the same fixes to the data collected during Alpha testing.

What are the requirements to opt in?

We are looking for three locales to opt in for Beta testing of the pretranslation feature. We need one or two contact persons in the localization team who are active and responsive to take charge of this testing:

  • Pretranslation strings end up in the product, so it’s important to ensure that reviews happen timely while we verify that the system works as planned. We expect reviews to happen within a week from the actual pretranslation.
  • We need help identifying bugs, and responding to requests.
  • At the end of the Beta phase, we will ask you to review a spreadsheet and assign a manual score to each rejected string. For each string, the spreadsheet will include the source string, the accepted translation, and the rejected pretranslation, so it shouldn’t take too much time to complete.

The locale needs to be supported by Google AutoML Translation. This is the list of currently supported locales in Pontoon that are also supported by Google AutoML Translation: af, ar, az, bg, bn, ca, cs, cy, da, de, el, es, es-AR, es-CL, es-ES, es-MX, et, fa, fi, fr, gl, gu, he (iw), hi, hr, ht, hu, id, is, it, ja, jv, ka, km, ko, lt, lv, mr, ms, my, ne, nl, no, pa, pl, ps, pt, ro, ru, sk, sl, sq, sr, sv, sw, ta, te, th, tl (fil), tr, uk, ur, uz, vi, zh-CN, zh-TW, zu.

Note that going forward we’ll extend the list of supported locales to also include ones that are supported by generic machine translation engines.

Other aspects to keep in mind:

  • The Beta phase will run between April and June 2023 and will include, on top of the 3 locales selected, the following locales: de, fr, it, sl.
  • The list of projects enabled should be limited to a maximum of 5 per locale, selected among a predefined list (see form).
  • If you’re a translator and want to participate, we suggest you talk with your locale managers about this initiative before opting in.
  • It’s crucial for the locales involved to be responsive in this phase.
  • We might disable the feature if we receive feedback that the quality is insufficient, or if the time required to review pretranslations is too long.

We will hopefully receive more than 3 requests to opt in, so we will pick locales based on the activity frequency and role of the person sending the request, at the same time trying to create a diverse pool (e.g. include different scripts and grammar complexity). Depending on bandwidth, we might open the testing up to more than 3 locales.

The form to opt in is available here, deadline for answer is March 17: https://forms.gle/JtioUVLrAgNRRZpv9

Thanks for making it this far into this post. We’re here to answer any questions and doubts, and we’ll update here when we have identified the locales to include in the next phase.

1 Like

Hi @flod, could you please enable public access to the form? It seems to be restricted to mozilla org right now. Thanks!

Thanks for catching that :man_facepalming:

Should be fixed now.

1 Like

Thanks to all folks who took the time to fill the survey.

Given the number of locales that volunteered, we decided to expand the testing to 7 locales instead of 5. This is the full list of locales that will be involved in this testing: cy, de, es-AR, fr, hu, id, zh-TW.