The Open Innovation Team ran a two week experiment with SUMO localizers to understand if introducing machine translation could improve the current workflow, the contributor experience and make it easier to have top priority articles covered with less effort.
The results were very satisfactory. By the end of the experiment 62 of the 72 articles were fully translated. We attracted 2 new contributors and volunteers reported an overall satisfaction score of 4.75/7. While there is lots more to learn and experiment with, these results suggest experimenting with adding mechanisms, like machine translation, into traditional workflows can save localizers a lot of time and cover many pending articles. Keep reading to learn more!
The demand for new and updated localization on SUMO has been increasing in the past years. Mozilla is bigger and we are creating and testing new concepts and products faster than ever (and this will keep increasing in the future). This translates into a lot of new and updated documentation for these products that is as important for users experience as the product itself.
We also value and care about the valuable volunteer time, contributors are devoting their free time to help Mozilla and its products to be better, and drive the mission in their languages.
We want to ensure that contributing to Mozilla is something fun and rewarding, and help communities grow by engaging existing and new people who might be interested in a low effort activity and where they would not need to be fluent in English to contribute, ensuring locales are fully localized and content is updated easily.
We have the assumption that if we are able to improve the localization experience we will save contributors a lot of time, they will enjoy the experience more and we will be able to do more with less.
The current status of machine translation technologies has evolved a lot in the past years and today the technology is able to produce super accurate localization drafts that just take a few minutes for a human to proofread and validate. Based on our external research and conversations with other volunteer-based organization this can help save a lot of time to get an article ready in a different language quickly.
In order to validate these assumptions we wanted to run a test with some locales to see if that was true. We didn’t want to make a decision on whether or not to implement machine translation, we wanted to test with contributors’ help if we were moving in the right direction.
This work will also inform a wider localization strategy for SUMO moving forward.
From our internal research we knew that the locales that represent 90% of Firefox Desktop monthly active users (MAU) were mainly maintained by 1-2 core contributors each, doing 85-95% of the localizations.
We also identified this as a risk that became a reality in some locales where there were no active contributors anymore, exposing users to a bad experience due to a lack of support articles in their language and outdated information. This directly impacted SUMO team goals around consumer satisfaction.
During Q2 2019 we ran an experiment in some locales at risk (not 100% localization coverage in top 50 articles for a priority product).
- Selected locales have 100% coverage on Top 50 articles for the product selected.
- Community satisfaction with the workflow is at least ⅗
- We are able to engage at least 2 new people in each selected locale.
Locales identified for experiment:
- Dutch → All products
- Korean → Firefox iOS
- Thai → Firefox Lite
- Chinese (Taiwan) → Firefox Lite
After identifying the locales, we engaged with the local communities to let them know about this experiment and asked them to engage and help to review the machine translated articles, proofreading them directly on Kitsune, editing if needed and approving.
A communication and engagement plan was developed to make sure we managed the relationship and expectations with these localizers and get it into a positive framing, applying experiences with mozilla communities in the past and leanings from conversation with external projects like Wikimedia who already have dealt with similar experiences with machine translation.
For each locale we secured at least 1-2 people who commited to help during the two weeks where we wanted to run the experiment.
We developed a workflow to export the articles that needed localization, applied machine translation on them and imported them back into the SUMO platform “pending for community review”.
This process was very manual and was supported by our technical writers, who ended up having 72 articles to export. We also developed a script to connect to Google Cloud Translate API and run the translation, then we manually imported them back to kitsune and marked them for review by the community.
Between May 13th and May 24th localizers checked the full list of URLs we provided.
Eight localizers engaged in this activity:
- Fjoerfoks, mozbrowser (Dutch)
- Seulgi, dskmori (Korean)
- Chengings (Thai)
- Peter Chen, Bor, Irvin Chen (Chinese - Taiwan)
Thank you so much for your time and contributions! You were fundamental to shape this experiment and keep improving our thinking about how to improve localization at SUMO.
Outcomes and insights
In general we can say that the experiment was very satisfactory and we met ⅔ of the goals of the success indicators.
The help of machine translation saved localizers from 2 to 3 times what it would have taken them to create a localization from scratch.
From the 72 articles we imported, 62 were covered in time for the campaign ending. When we asked about how likely they would recommend a similar experiment to other localizers the result was an average of 67% satisfaction 4,75/7.
|Selected locales have 100% coverage on Top 50 articles for the product selected. (72 selected articles)||86% of the selected articles were fully translated by the end of the campaign. (62/72 - 10 missing articles for Chinese Taiwan)|
|Community satisfaction with the workflow is at least ⅗ (60% satisfaction)||100% The community reported a satisfaction rating of 4.75/7 or 67%. )|
|We are able to engage at least 2 new people in each selected locale.||30% of the contributors who participated in the campaign were new (2/7)|
These are some detailed insights:
The export/import process took a lot of time
Exporting/importing the articles and mark them for review took a lot of staff manual work (2-3 days of work), it’s not scalable.
Recommendation: Invest in creating a export/import mechanism for kitsune.
Machine translation was good, but it can improve
When contributors were asked about the quality, accuracy and “if it sounded natural” for machine translations, localizers said that the quality was good (2,75/4). There were some comments expressing their positive surprise comparing with the quality they expected.The only exception was Chinese (Taiwan) who rated the quality as low.
Additionally, on average localizers had to edit from 5 to 15% of the original machine translation provided in order to adapt it and sound more natural.
This work took on average between 5 and 15 minutes per article, depending on article length. We also asked localizers how long would have taken them to create a translation from zero and on average they reported that at least 20 to 30+ minutes.
It is clear that machine translation can save localizers at lot of time.
Recommendation: Invest in the development of the machine translation code.
Machine translation provider quality didn’t not work for all locales
The provider we used (Google Cloud Translate) was not really effective for Chinese (Taiwan).Translators rated the quality as extremely low. Making the translation not useful for localizers who had to re-do most of the sentences.
Chinese (Taiwan) locale struggled to have the articles reviewed in time because the quality of the machine translation was reported to be unnatural and containing too many Chinese (China) expressions, meaning they had to re-do most paragraphs.
A single provider won’t work for every locale.
SUMO review system is not very friendly
The review system at SUMO is not very straight forward for newcomers. You need someone with review rights to approve revisions (potential bottlenecks) and it’s a multi-step process not as agile as some might expect, detailed instructions had to be provided.
The community engagement plan was important
The engagement and communication plan really paid-off by having really positive reactions to the experiment and being open to participate in something new that has been historically a “tricky topic” among the mozilla localization communities.
Scalability will be an issue
We know wikipedia has been doing a per-language approach to make sure there is no abuse of the system from localizers and ensure high quality. We currently have a technical limitation for scalability.
- Invest in creating a export/import mechanism for kitsune.
- Invest in the development of the machine translation code.
- Do additional research and testing to understand MT providers limitations.
- Invest in understanding kitsune UX limitations and identify quick wins.
- Iterate the experiment with additional locales to understand scalability.
1. Invest in creating a export/import mechanism for kitsune
Effort: Low (2 days of dev time)
People needs: Dev time, CM time
Not only we’ll need to automate the export/import process for further experiment around machine translation, but it’s clearly an identified need for other internal processes dealing with content.
2. Invest in the development of the machine translation code
Effort: Low-medium (depending on how much we want to improve)
People needs: Dev time, CM time
If we want to expand our tests around machine translation, there are certain improvements we can apply to the code generating these translations. At least we’ll need minor changes to markup handling (P1) and multi-provider support (P2). Bonus (P3) if we can get formal/informal handling, translation memories and terminology, but not a blocker.
3. Do additional research and testing to understand MT providers limitations.
Effort: Low (depending on how many locales we need to understand)
People needs: CM time
We should definitely understand how the current provider is handling our larger locales (including both Desktop and Mobile products)
Running a sample test for feedback with these communities will help us understand our needs for additional providers, which would influence recommendation 2.
4. Invest in understanding kitsune UX limitations and identify quick wins.
Effort: Low-medium (depending on how deep we want to go)
People needs: Program manager time, dev time
We should work with our communities to understand the main pain points about using kitsune and prioritize the ones that are clearly draining people’s time. I suspect there are a few quick wins we can just do by tweaking the frontend.
5. Iterate the experiment with additional locales to understand scalability.
People needs: CM time, team support
We need to understand if this experience would work at scale. This would be a combination (and blocked by) recommendations 1, 2 (if possible) and 3.
Once we have the technical limitations solved and we understand how to provide the best MT to each locale, we should test if this works with all our larger locales to test how does it scale.