Automating page changes with the PUT API

As part of zone removal we need to replace “zone sidebars” with “quicklinks sidebars” in some zones. In those zones we have to edit every page in every affected locale to add the call to the quicklinks macro.

Since that amounts to a lot of page edits, I’ve looked into automating it. I have a script that:

  • gets the raw content of a page using https://developer.mozilla.org/locale/docs/path/to/page?raw

  • adds the call by appending <div>{{ThingSidebar}}</div>

  • writes the result back to the page using the PUT API https://developer.mozilla.org/locale/docs/path/to/page$api

I’ve tried running this on a hundred or so pages and manually checked the diffs using the page history. Ideally they would only show the extra <div>{{ThingSidebar}}</div>. In many cases they show additional changes :-/.

These are the extra changes I’ve seen (the linked diffs are just examples, there are many diffs with similar changes):

I’d love to know more about why these extra changes are showing up, and whether they represent a problem for this approach.

2 Likes

The content is going through bleaching upon save, and that not only removes stuff that isn’t permitted on MDN, but also does some “tidying” of the code, which includes stuff like putting attributes in a specific consistent order, tweaking line breaks, updating heading IDs to match current standards, etc.

Older content especially is going to be subject to a lot of changes when being saved because they were written when the save algorithm did a different set of adjustments during the save operation.

Sheppy

The content is going through bleaching upon save

Do you mean it’s going through bleaching when using the PUT API? Or it’s not going through bleaching, and that is the source of the differences?

If it’s not “going through bleaching”, does that mean the PUT API is not a safe way to change content?

Those automatic changes happen for every edit (even manual). It seems the removal of newlines before <pre>is quite new but the rest is “usual” for manual edits.

In other words, using the PUT API does seem to have an analogous effect as the “manual” way of editing docs.

(but yeah diffs carry a lot more changes than one’s intended so proofreading is not made easy (#TeamL10N) :confused: )

It seems to be safe enough, but yeah, it’s going through the same validation process as content goes through when saved from the editor.

This doesn’t seem quite right though.

Let’s take https://developer.mozilla.org/zh-CN/docs/Tools/Network_Monitor for example.

Before I started changing it we were at revision 1356550. If you look at the source, the first img orders its attributes like this:

<img alt="" src="..." style="..." />

After I PUT the sidebar changes we were at revision 1392945. Now the img attributes are reordered:

<img src="" alt="..." style="...">

Then I reverted the change, to get revision 1393578. The attribute order is also reverted:

<img alt="" src="..." style="..." />

Then I manually edited the page, to add <div>{{ToolsSidebar}}</div>. This gets us to revision 1393579. In this revision, the attributes are not reordered:

<img alt="" src="..." style="..." />

…and the diff is clean: https://developer.mozilla.org/zh-CN/docs/Tools/Network_Monitor$compare?locale=zh-CN&to=1393579&from=1393578.

So… it looks as if I get a different effect using the PUT API than I do manually editing the page. Perhaps I’m worrying about this too much - I don’t think reordering attributes will give us any problems. But it’s a bit scary editing hundreds of pages using a script, and it would be nice to know exactly what’s happening before going ahead with it.

1 Like

Darn, I never dug that much. Thanks Will.
And yes I share your concern with the potential of mass editing :slight_smile:

This is really interesting, and I think I’ve got to the bottom of this. In this case, there are basically two transformations that the code can perform, one I’ll call (to hide its complexity) the “triage”, while the other is the well-known “bleaching”.

The “triage” is a multi-step process responsible for, among other things, injecting missing section ID’s, and it basically parses, modifies (if necessary), and re-generates the HTML. It is responsible for at least some of the HTML attribute re-ordering.

When doing manual edits, I discovered that the editor starts with the content of the current revision, which has been “triaged” but not “bleached”. It’s also interesting that the CK editor’s “source” view is actually a modification of that content (for example, it performs its own re-ordering of the HTML attributes). So when a new revision is created from the manual edit session, its content has not been “bleached”.

However, when using the PUT API, we’re starting with either the raw content of the page (e.g., a GET of https://developer.mozilla.org/en-US/docs/Tools/Tools_Toolbox?raw) or the content returned by the document API (e.g., a GET of https://developer.mozilla.org/en-US/docs/Tools/Tools_Toolbox$api). Both return the exact same content (the document’s “html” attribute, not the “content” attribute of the current revision of the document), and that content has not only been “triaged” but “bleached” as well. So when the PUT is performed to the document API (e.g., PUT to https://developer.mozilla.org/en-US/docs/Tools/Tools_Toolbox$api), the content of the new revision that is created has been “bleached”.

Also, it’s important to know that what the user sees on a document page (e.g., https://developer.mozilla.org/en-US/docs/Tools/Tools_Toolbox) is always derived from the document’s “html” attribute. The document can have one or more revisions, each which have their own “content” attribute, but when one of those revisions is made the current revision, it’s “content” attribute is first “bleached” and then stored in the document’s “html” attribute. So the content shown to users is always “bleached”, while the content stored in a revision is usually not “bleached” since it’s usually created manually and not through the PUT API.

So when we compare two revisions (e.g., https://developer.mozilla.org/en-US/docs/Tools/Tools_Toolbox$compare?locale=en-US&to=1389936&from=1389656) we’re usually comparing unbleached content with unbleached content, but after using the PUT API, we’re suddenly comparing “bleached” content with “unbleached”.

So I think we can safely, confidently use the PUT API. One thing though, is that I’d recommend starting with the content returned by the document API (e.g., a GET of https://developer.mozilla.org/en-US/docs/Tools/Tools_Toolbox$api) rather than the raw document content (e.g., a GET of https://developer.mozilla.org/en-US/docs/Tools/Tools_Toolbox?raw), and only because it returns the “ETag” header which can be used for making the PUT safer by adding an “If-Match” header to avoid collisions with other editors.

4 Likes

Wow, that’s a fascinating detail of Kuma internals. Thanks a lot for not just digging into this Ryan, but also taking the time to explain it so well. Much appreciated.

@ryanjohnson

That’s incredibly useful. I’ve added info to the documentation about the PUT API that hopefully gets this more or less correct:

Sheppy