API documentation structure idea

sheppy · September 21, 2018, 4:01pm

I’ve had an idea. We have a few problems that are caused by the current organization of our API documentation:

There’s no structural correlation between an API and the interfaces, types, and so forth that comprise it. You can’t look at the URL likehttps://developer.mozilla.org/en-US/docs/Web/API/ImageBitmap and go "Oh, that’s part of the Canvas API.
There’s no structural correlation between reference documentation for an API and the other content for it. Not even the overview page for the API has any structural relationship to the references that are putatively part of a documentation set.
Because all APIs are lumped into one massive “directory,” it’s impossible to isolate individual APIs for monitoring or review by a specific person. This is a big one.
This makes automatic generation of menus, sidebars, and landing pages more problematic than needed. We have to rely on tricks like special tags as well as metadata to do things that should not require any additional information beyond the structure of the site.
If we make a change that means we should rebuild the pages for an API, we have no actual way to do this, since you can’t just say “rebuild this folder and its contents” (even if we had an easy way to do that at all). We have to either rebuild the site or manually find the pages and rebuild.

My proposal for an improved system:

Each API has an overview page whose slug is Web/API/Whatever_API (like we do now) but with titles which are more descriptive, such as “WebRTC: Real-time two-way audio, video, and data” or “Media Capture and Streams” or whatever.
All of the interfaces, types, dictionaries, and so forth are located below this with a slug such as Web/API/Whatever_API/InterfaceName and a title which is more descriptive than our current one-word titles: “RTCPeerConnection: Creating and managing WebRTC connections”, for example.
Properties and methods are then below that, as before: Web/API/Whatever_API/InterfaceName/propertyName. Ideally with a description, too, but see [1] below.
All guides and tutorials are also located under the overview page, at Web/API/WebRTC_API/Introduction_to_data_channels with the title “Introduction to WebRTC data channels” for example.
Make sure that we have redirects for all current pages to their new locations.
Add to the mdn/data repository the needed information to map an interface or type name to its API, and by doing so to that API’s slug.
Update the {{domxref}} macro to use the updated MDN metadata to build the URL for the destination page instead of stupidly assuming everything is just Web/API/Whatever.
Now we get to the good part, where we’re able to redo macros like {{APIRef}} and such to use the new structure to be more efficient.
Tools can be smarter and better. For instance, we have the option now to have a tool such that when we pull an updated BCD, we can have the tool trigger rebuilds of all the pages for an API by simply rebuilding all pages whose slugs begin with with the overview page’s slug.
Now topic curators can pick and choose the exact APIs they’re responsible for and watch them with a single click instead of having to either watch all APIs and filter in their mail client or manually watch every single page that’s part of the API.

[1] Ideally our page <title> should be a somewhat descriptive string on the order of up to 65-70 characters or so. There’s an actual number which I don’t recall off the top of my head, but it’s about that. But these are currently also used as our <h1>, which we don’t want. What we really want is to have <title>RTCPeerConnection.close(): Close a WebRTC peer connection</title> and <h1>RTCPeerConnection.close()</h1>, which is a known SEO development issue we need to address one day – the separation of <h1> and <title>.

exe-boss · September 21, 2018, 6:34pm

That should probably be Web/API/Whatever_API/InterfaceName/propertyName instead (it will make moving the interface documentation easier).

sheppy · September 21, 2018, 6:54pm

Yes, that’s actually what I meant. Good catch.

exe-boss · September 22, 2018, 10:47am

This will also make the API documentation status pages less fragile as they’ll no longer have to rely on page tags.

Also, the {{DOMxRef(…)}} macro will continue to work for old pages, which will be turned into redirects to the new location.

sheppy · September 24, 2018, 2:34pm

Yeah, that’s exactly what I was thinking.

exe-boss · September 25, 2018, 8:55am

Also, another thing, right now, if I need documentation for some interface I don’t know, I can just type https://developer.mozilla.org/docs/Web/API/<InterfaceName>. With this, unless redirects are created for each newly added interface, the above will not always work.

Also, what about interfaces which don’t really fit into any single API, or properties that belong to a different API than the interface itself (for example: Element.requestFullscreen() belongs to the Fullscreen API, but Element belongs to the Document Object Model)?

chrisdavidmills · September 26, 2018, 8:06pm

This sounds really awesome sheppy. I agree with pretty much all of this. Before commenting any further on fitting this into our roadmap, I’d like to see a proper plan with estimates of how long you think this’ll take, roughly.

Since we are also talking about doing a sweep of the content for other reasons such as fixing up API titles, putting event docs in the right places, and doing maintenance audits, we could perhaps kill multiple birds with one stone with this process. Food for thought anyhow.

sheppy · October 5, 2018, 5:02pm

That’s a good question about methods or properties defined in one API but implemented on an interface defined in another.One idea I’ve had is to handle those like this (using Element.requestFullScreen() as an example):

Create Web/API/Fullscreen/Element; this page is a quick API overview page style page quickly describing what the Fullscreen API adds to Element. It should include the Properties and Methods sections as well, with a link then to…
Create Web/API/Fullscreen/Element/requestFullScreen and document requestFullScreen() there in its entirety.
Create a redirect from Web/API/DOM/Element/requestFullScreen to Web/API/Fullscreen/Element/requestFullScreen and make sure it’s listed on the Element overview page, probably just along with the other methods.

This way, we keep the relationship between the method and its parent API, but it’s available where it’s expected to be as well.

(Note above using “Fullscreen” instead of “Fullscreen_API” in the URLs; since we have this improved structure, it’s no longer needed in order to avoid one-word page names that could conflict with interface names).

We could simplify this by leaving out the stuff under Web/API/Fullscreen, but we do lose some of that correlation that could be helpful.

sheppy · October 5, 2018, 5:10pm

Some other questions for people to offer opinions on around this…

Should we take this opportunity to drop the requirement to include “_API” at the ends of API names in slugs? After all, we only had it in the first place to avoid the risk of having a one-word page name collision with an interface, and now that they will be at different levels in the hierarchy that risk is gone.
Alternatively, do we need the /API component in the slug even? Why not just Web/Fullscreen_API/InterfaceName?
Do we want to separate out the types of content within an API’s documentation, such as Web/Fullscreen_API/Reference/InterfaceName, Web/Fullscreen_API/Reference/InterfaceName/method and Web/Fullscreen_API/Guide/Full_screen_for_profit? Or just put the interfaces and guides in the same level, like Web/Fullscreen_API/InterfaceName/method and Web/Fullscreen_API/Full_screen_for_profit?

I’m sure I have more, but let’s start with these.

I’m working on a document to serve as a proposal and plan for making a transition like this; it will evolve with the conversations we have, I’m sure.

sheppy · October 5, 2018, 6:31pm

For what it’s worth, my opinions on the questions posed above:

If we keep the “API” portion of the URL, I feel that we should drop the “_API” from the ends of the API names in the reorganization, leaving URLs like Web/API/Fullscreen and Web/API/WebSocket.
I feel that we should keep the API/ level of the hierarchy, to separate out the APIs from other Web technologies.
I think I like the idea of having separate Reference and Guide trees below each API: Web/API/WebRTC/Reference/RTCPeerConnection/close and Web/API/WebRTC/Guide/Peer_connection_guide. However, I worry that this adds unnecessary depth to the hierarchy so am unsure if it is wise, even if I like it on an organizational level. Would love other opinions.

fscholz · October 10, 2018, 2:31pm

Thanks for starting to think about API docs and their organization. I think we have room for improvement, but I think I have different opinions on what the problems are and what we should be trying to improve.

There’s no structural correlation between an API and the interfaces, types, and so forth that comprise it. You can’t look at the URL like ImageBitmap - Web APIs | MDN and go "Oh, that’s part of the Canvas API.

I’m not sure this is a problem at all. Information Architecture is not just URL design, or navigating. Canvas is mentioned in the sidebar and perhaps the body text could mention this as being part of Canvas more prominently, but I don’t think it is a requirement for it to be reflected in the URL. Also, I think there is now Window: createImageBitmap() method - Web APIs | MDN, so ImageBitmaps can be used outside of canvases as well, so it is more like a universal API, that doesn’t always need to be used in the context of canvas, but also for normal images in HTML, if I understand it correctly.

There’s no structural correlation between reference documentation for an API and the other content for it. Not even the overview page for the API has any structural relationship to the references that are putatively part of a documentation set.

I’m not sure what you mean by this. Are you just talking about URL design again? I’m not sure it is a problem if these pages are under different trees, if the navigation and the content interlinks reference pages, tutorials, guides and overview pages.

Because all APIs are lumped into one massive “directory,” it’s impossible to isolate individual APIs for monitoring or review by a specific person. This is a big one.

I think this a problem with the implementation of how we monitor content, not about the chosen URL design or organization of docs. Monitoring a wiki is not easy. I hope we will have “moving towards a PR model” on our roadmap, where “monitoring published changes” will become “review before publishing”.

This makes automatic generation of menus, sidebars, and landing pages more problematic than needed. We have to rely on tricks like special tags as well as metadata to do things that should not require any additional information beyond the structure of the site.

I think it is true that our sidebars are a mess and that we should aim for a better implementation of them. I think we should talk about this problem separately, though, as it is not just API docs that are affected here.

If we make a change that means we should rebuild the pages for an API, we have no actual way to do this, since you can’t just say “rebuild this folder and its contents” (even if we had an easy way to do that at all). We have to either rebuild the site or manually find the pages and rebuild.

I think regenerating MDN pages is also its own problem. Many pages need regenerating more often, because of BCD and other things. I think we should talk about this more generally instead of treating it as a sub problem of API doc organization.

So, I’m not yet really sold that the above things are problems we should be solving right now. I think moving thousands of API docs around should have really strong reasons and should also address serious structural and information architectural problems our readers really have. I think it would be worth user researching what those problems really are. Some other problems you’ve mentioned are more general problems separate from API docs organization (other areas have the same problems), and I would advice to not think about re-organization as duct tape for other long term Kuma issues.

I think better landing pages that outline what kind of things a specific API (or a set of web apis) can do for a web developer, would probably of better help than moving pages around. Also, many things fall into more than just one API (see example above by ExeBoss), plus a thing on the web is not just APIs: some things come with HTTP headers or CSS properties… I don’t think we want these under the same URL tree as well. Categorizing is hard. Specs do change, get merged or split up. A nightmare to follow their categorization.

It’s not like I think the current organization is perfect. I do think we could do better organizing ourselves when it comes to (API) docs, but I think there are some bigger problems which we should try to solve first and which would probably more beneficial for the readers.

The whole thing about prototype vs. static (see other thread).
Mixins: Specs have set A of mixins, browsers have set B of mixins, MDN documents set C of mixins and web developers never see mixins directly.
Events are documented poorly in a different tree, with outdated and repetitive information from the main API docs.
Interfaces vs Dictionaries: The API/ tree is flat at the URL level but also in-content it is unclear what is an interface and what is a dictionary (or a mixin or something else).
The API doc pages are inconsistent. Many still use a legacy formatting. In other doc projects, reference docs are (semi) auto-generated, but we have no tools, because we have no consistency.
Moved… is the language that all Web APIs follow (iirc), but we have not investigated aligning us towards WebIDL so that we are consistent in our reference docs and that we could generate docs from WebIDL files in the future. Our doc structures aren’t based off of WebIDL. I would like to investigate if this would be possible. If we change URL trees, I would rather have URL trees dictated by WebIDL (API/Interface/XY, API/Dictionary/AB, …) than by (random) categorization.

Now, it’s hard to tell, if “my” structural problems are really affecting a whole bunch of more readers opposed to the problems you’ve presented. Also, as much as it sounds like a good idea to tackle a bunch of problems with a big boom reorganization at once (re Chris: “kill multiple birds with one stone”), this is really hard to achieve on MDN. I would suggest to user test and tackle these problems individually and finish solving single problems instead of trying to do everything at once. We’re talking about thousands of pages and the wiki doesn’t allow mass changes easily.

That said, I think a good next step would be finding out what the most annoying structural problem with our API docs is, by asking our users. At the same time, it might be worth looking into WebIDL and understand how we could use it to bring in (testable) consistency into our reference docs.

Thanks,
Florian

sheppy · October 10, 2018, 9:05pm

@fscholz –

You make a number of good points. And we do definitely have different opinions of the value of the physical structure of the content. I very strongly feel that the URL structure is important for several reasons, including both SEO and automation, but also for human navigability of content.

So, yes, ImageBitmap is used in several APIs. But it is defined in a specific one, and that’s what matters structurally. There’s nothing preventing cross-linking from going on, and in fact it absolutely should be, in order to establish the relationships across topics and to create SEO value.

You’re right, of course, that information architecture isn’t just about URL structure. That said, a good URL hierarchy can make building that IA easier. Part of the IA is provided outright by the URL structure, but more importantly, a good URL structure makes automation much easier. If you’re able to rely on the fact that everything under a given base URL is related to the base URL, you can just walk through that hierarchy and generate your UI elements from it.

That’s in fact the biggest reason, for me, to do this: we can then rely on the content hierarchy in macros that automatically generate content. Macros such as the InterfaceOverview one I’ve been fiddling with for some time now would be able to correctly assume that all articles about a given API are found and included in the lists of guides and tutorials on the generated page. There’s a lot of neat stuff we can do automatically that would save writers a huge amount of time – but only if we have reliable content organization that separates out unrelated material and places related material together in predictable ways.

This one I disagree on. I think we can make monitoring enormously easier with proper content structure. By having related pages nested under the same parent, content owners can monitor the entire API without having to hunt down individual pages, and it can be done without any changes to the Kuma codebase.

I also feel that even if we go toward a PR-style system for change handling, having content organized in a proper hierarchy will make tracking changes and managing content easier. Even if changes are handled on a push and review type of basis, content hierarchy makes it easier to identify related content.

We can certainly talk about sidebars separately. But I still feel strongly that a good content hierarchy makes generating sidebars much easier. If you can create an accurate sidebar by simply presenting the titles of pages in a hierarchy, you’ve saved a lot of time and effort.

I’m not suggesting it’s a sub-problem of API doc organization. I’m just saying that having related content organized well will make it easier, when the time comes, to implement the ability to regenerate related content without having to manually find the pages that need to be updated.

This is why I think content hierarchy is important, actually. Moving pages into a proper organization isn’t a fix in and of itself. It’s a means to an end. Having content organized properly makes all the other problems easier to solve, and improves our ability to optimize for SEO at the same time.

Yes, this is a problem. It would be partially fixed by updating the URL structure to include information about this, although the in-content part would still have to be addressed.

So… I’m not entirely sure what you mean here by “aligning us towards WebIDL.” I’ve been documenting based on the specifications’ WebIDL for at least a year now, if not longer (and from the WebIDL in the Firefox source tree for over a decade). I do agree that coming up with a system to do more automation based off the IDL would be a logical thing to do and is something we should be invested heavily in, except so far finding the resources to do it has not been in the offing.

I’m not sure where you get “random categorization” from. I’m talking about grouping by the APIs as defined by the specifications, not by randomly collecting stuff and calling it a group.

The idea of including the type of item being described is an interesting one worth considering, although sorting out how it works out with whatever other organizational changes are made could require some thought.

We certainly could look at running a survey to gauge users’ feelings about the structure of MDN and where the prooblem spots are. I’m not convinced this is a great way to plan work on content structure changes, but it can be a useful data point.

At any rate, I see many of your points, and agree with many of them, but I still feel that improved structural organization of the content is an important step on the way toward a better and more manageable MDN.

fscholz · October 11, 2018, 11:30am

So, yes, ImageBitmap is used in several APIs. But it is defined in a specific one, and that’s what matters structurally.

Well, it is defined in HTML, right? I’m not sure a category “HTML” is useful, along with categories “DOM”, “URL” and many other specs that aren’t really categorizing things usefully for web developers. And as I said, specs change, get merged or split up. We can try this out and audit all Interfaces/Dictionaries/… we have under the API/ tree and then see if we can categorize them clearly. I doubt this is possible, and I’m not sure it is useful use of our time to sort them this way and re-sort them whenever the APIs change or their specs. And then it is still unclear to me, why certain generic APIs like the ones defined HTML/DOM/URL/Fetch/XHR and many more should be in categories. Web developers use all of this together anyway, right? MDN is powerful because you don’t have open all these specs and have the information together in one place. MDN reflects the whole standard library that is available to you as a web developer no matter the spec. My concern is that “categories” will disrupt this for no good reason. I haven’t yet seen categories, so I’m eager to see what you’ve come up with. Creating them from specs seems fragile and “random” to me.

That’s in fact the biggest reason, for me, to do this: we can then rely on the content hierarchy in macros that automatically generate content. Macros such as the InterfaceOverview one I’ve been fiddling with for some time now would be able to correctly assume that all articles about a given API are found and included in the lists of guides and tutorials on the generated page.

To me, this macro is at the idea stage and I’m not sure it’s something we have agreed to move forward with yet. I don’t think we should let implementation details constrain the way we organize documentation. Same for sidebars, monitoring content, or page regeneration.

So… I’m not entirely sure what you mean here by “aligning us towards WebIDL.” I’ve been documenting based on the specifications’ WebIDL for at least a year now, if not longer (and from the WebIDL in the Firefox source tree for over a decade).

Of course, we consult IDLs to write docs, but they have never been used to enforce style rules or how we organize things. Someone had a mixin in front of them for the first time, documented it in a way they thought it was correct. Then a few weeks later others maybe followed this and copied from it, or created their own style for the mixin they had to document. We never user tested if web developers actually understand the way mixins are documented. Same with many other things WebIDL defines. Some people do dictionaries inline, other create new pages for them. Again, no idea what our audience prefers.

So, no, we never had agreed page templates based on WebIDL types and there are no consistent API docs, which, again, in my opinion, is a much bigger structural problem than how things are categorized in our IA or under which URL tree things are living. I could be wrong about this, so maybe we could survey our users what the most pressing structural problem with our API docs are. If it is lack of categories and better URLs, I will be more convinced to move forward with moving around thousands of API pages. I’m no SEO expert, but I believe even more redirects are a problem for SEO and for Kuma.

sheppy · October 11, 2018, 2:58pm

@chrisdavidmills - Your input would be welcome at this point, since I’m working on writing up this document. If you wind up convinced by Florian’s points, we need to discuss whether or not I should continue working on this thing at this time.

chrisdavidmills · October 11, 2018, 3:36pm

Florian does have some very good points, and I think this shows that it is a delicate area that needs a lot of discussion before we commit to doing anything. Rather than stopping writing up the doc, I think it would be useful to include a matrix that tries to show what the advantages and disadvantages are of moving to your proposed new model, so we can think about it and discern whether it is worth the effort at this point. What problems is it going to solve?

Maybe we could mitigate or get rid of a bunch of these problems with a bunch of smaller steps, rather than having to go with a full reorg.

sheppy · October 11, 2018, 3:57pm

Sounds good. I’ll work on that.

wbamberg · October 11, 2018, 6:48pm

It looks to me as if there are a couple of things getting entwined here:

what sort of high-level organization should we present to users of MDN? What are they looking for and what kinds of navigational paths do they need to follow?
how should we implement that organization?

It’s quite possible for a given presentation to be implemented in various different ways (URL structure, tagging, some other metadata solution). I think it would be much better to start with understanding what we want the user experience to be like, before proposing an implementation.

On URL structure: it’s very tempting to rely on URL structure for IA, because given the way we use Kuma this is about the only reliable way to do it. For example, it’s frustrating that the CSS docs don’t have any URL structure (i.e. no “CSS/Properties/” and “CSS/Selectors”) but instead rely on tags. This is a problem because tags are a totally unreliable way to organize docs, because anyone can add and edit tags and we have no easy tools or processes for making sure they are consistent. So for example we have tags like: “CSS Property” but also “Property”. We have “CSS Data Type” but also “CSS Data Types” and also “Type”. And these tags are misapplied, and attached to non-reference pages, and so on. So if you want to get a list of all the CSS properties that have reference pages on MDN… it’s not easy.

Also, KumaScript meshes well with URL structure: it’s easy to get a list of pages in a hierarchy, so macros we use in landing pages and sidebars want IA to be reflected in the URL structure.

But although this is our present it might not be our future. We might want to look at different ways to define the IA, and different tools to specify sidebars and landing pages. For a big change like this, I think we should try not to be constrained by our current implementation.

sheppy · October 11, 2018, 8:59pm

Those are good points, Will. Certainly we could do this kind of information architecture work without physically migrating pages around, relying instead on internal structures to establish relationships among pages. That would be amazing. Unfortunately, I don’t really see that happening anytime soon, and there are definite problems I would like to try to address sooner rather than later because getting this stuff done will make SEO improvements so much easier and more resilient in the future. Basically it’s a matter of trying to do the groundwork needed to be prepared for the eventual resumption of work on projects like SEO and improved site navigation.

I’m increasingly concerned about the need to postpone substantial documentation organization and structure improvements because of other projects or because the “timing isn’t right,” so to speak. It’s never a convenient time to deal with this stuff, so we need to just bite the bullet and pick the changes we most feel will be helpful and just do them.

I’m not necessarily arguing this is the one we should take on – but I am convinced we need to make time now to review our content architecture, make decisions, and implement the changes needed.

Topic		Replies	Views
DefaultAPISidebar, APIRef and GroupData MDN	27	1741	June 14, 2019
MDN feature suggestions MDN	5	874	November 22, 2017
Proposal: Have MDN API provide data for all specs linked to in Specifications tables MDN	9	713	September 17, 2018
Incorrect titles for method/property articles MDN	42	2615	March 25, 2021
How are our pages structured? MDN	13	814	August 30, 2017

API documentation structure idea

Related topics