Proposal: Have MDN API provide data for all specs linked to in Specifications tables

sideshowbarker · September 14, 2018, 4:39am

I’m interested in discussing with others here the utility (and feasibility) of having the MDN API provide data for all specs that are linked to in Specifications tables in MDN articles.

I filed an enhancement request/issue here:

Quoting that issue’s description:

A concrete use case for this data is, I want to add annotations in the HTML spec for all defined terms in the spec which have a corresponding MDN article that contains a Specifications table with a link to that term’s URL (with fragment).

Currently, lacking a way to use the MDN API to get that, I need to crawl the MDN site and parse/scrape articles in order to the data and put it into a form I can use in the build process that generates the published HTML spec.

If the MDN API provided the data, I wouldn’t need to do that crawling/scraping.

Here’s a proposed JSON format for the structure of the exposed data:

{
  "https://html.spec.whatwg.org/multipage/#2dcontext": [
    {
      "path": "API/CanvasRenderingContext2D",
      "title": "CanvasRenderingContext2D",
      "summary": "The CanvasRenderingContext2D interface is used for…"
    },
    {
      "path": "API/Canvas_API",
      "title": "Canvas API",
      "summary": "Added in HTML5, the HTML <canvas> element can be…"
    }
  ],
  "https://html.spec.whatwg.org/multipage/#abstractworker": [
    {
      "path": "API/AbstractWorker",
      "title": "AbstractWorker",
      "summary": "The AbstractWorker interface…"
    }
  ],

So the structure is:

the JSON object has spec URLs as keys
the value for each key is an array containing an item for each MDN article that has a Specifications link to that spec URL
each item in the array is an object with “path”, “title” and “summary” keys
the “path” value is a URL segment for that article’s path within the https://developer.mozilla.org/en-US/docs/Web/ tree
the “title” value is the article’s display title
the “summary” value is the article’s class=seoSummary content — or, if the article has no class=seoSummary, the contents of the article’s first non-empty paragraph

Of course the data would not just have https://html.spec.whatwg.org/ spec URLs, but would also have URLs for referenced terms in CSS, etc., specs too.

fscholz · September 14, 2018, 8:59am

Thanks for the suggestion, Mike.

Over at mdn/browser-compat-data, it has also been requested to add spec links to features. It would allow mapping MDN docs, compat data, and specifications. The current idea is to add "spec_urls" next to the already existing “mdn_url”.
Issues:

(Detail: it might replace “standard_track” in the bcd schema, as providing a link to a standard answers the question whether it is a standard feature or not).

I think this approach would help to link to MDN docs from specs and also display detailed compat data in specs, which is something that Marcos Cáceres was interested in for the Payment spec.

chrisdavidmills · September 14, 2018, 9:58am

I like this idea too. This would answer the need to have a place to give feedback on specs in MDN pages.

Could we update the Spec2 macro to perhaps show a link to the spec as well as the standardization process badge? This’d be a lot quicker than having to manually update every page.

sheppy · September 14, 2018, 5:49pm

I like this idea in general, other than details of how it would fit in with other proposals for data we’re discussing. I do have thoughts on this JSON structure though; the form you’ve suggested seems very specifically only useful to your use case.

First, some observations and questions, then some proposed changes.

Questions:

This proposal looks like all specs and items within specs are in one huge JSON file (otherwise, you wouldn’t need the keys to be full URLs). That seems impractical and unnecessary. Instead, why not have each spec in its own file?
Could the specs be identified using the same strings we use to identify them on MDN? We could provide a way to map these bidirectionally, from our identifying string to URL and back.

Proposed changes:

Have each specification in its own JSON file.
The JSON is a series of objects whose keys are the names of the individual objects in the API (interfacename, dictionaryname, elementname, css-attribute-name, etc)
Each object may contain nested objects if the item has members of its own, such as properties, attributes, values for CSS attributes, and so forth
Each object has the following members as well:
– "slug" - the same as “path” in the original, but using the same terminology we use elsewhere in our data
– "title" - same as the original; the name of the item. This is really only needed if the keys are not using correct capitalization though.
– "summary" - same as before; the SEO summary of the item or appropriate substitute as needed

sheppy · September 14, 2018, 5:50pm

One problem with replacing "standard_track" with the spec URL link instead of having both is that you lose the ability to cover the case where “this was once part of the specification but was removed in later specification updates”, which we currently can easily handle.

sideshowbarker · September 15, 2018, 12:39pm

Yes — having the spec_urls data in BCD itself would address the main need here.

But for my use case at least, in order to avoid the need to for me to scrape MDN myself, what would also need to be included is the MDN article titles and summaries.

However, since we’d not want to maintain the article titles and summaries in BCD (I think), one way the could be provided together with the BCD data is by regularly running some kind of build that pulls the data from MDN itself and integrates it with the BCD data and then incorporates into a published distribution:

As I noted in that issue, https://github.com/epistemex/mdncomp-data already provides such a distribution, with the MDN summaries and spec URLs — and may eventually also include the article titles — but I hope the MDN staff team can consider providing the same thing upstream in some way.

sideshowbarker · September 15, 2018, 12:38pm

I’d be happy to have it in any form at all. For my use case, it wouldn’t need to be distributed as one big file — or even as a file at all. What I’d really prefer is an API that exposed it.

exe-boss · September 15, 2018, 12:46pm

Another solution might be to create an npm package containing specification data (it might also make it possible to move SpecData.json out of the KumaScript repository.

sheppy · September 16, 2018, 3:41pm

Yeah, that would be a good approach.

I think that would be awesome, except for the issues of hosting and the costs of maintaining the service. I don’t think we have the bandwidth on a personnel level to handle it.

sideshowbarker · September 17, 2018, 12:29am

So I’m actually now realizing that having the spec_urls data in BCD wouldn’t be sufficient for my use cases — and I think for other use cases as well.

The reason is that BCD only contains data for a subset of the material having MDN articles with Specifications tables.

Take for example the following article:

That MDN article has a link to this HTML spec section:

https://html.spec.whatwg.org/multipage/webappapis.html#event-handler-attributes

But since the MDN article itself isn’t an article for a single interface/object or method or property or whatever, there’s never going to be a item in BCD with an mdn_url for that article.

Among the ~750 IDs in the HTML spec that are linked to from MDN articles, it looks like there are about 50 of them that don’t (won’t) have corresponding data in BCD.

So I think even if/when the spec_url data is added to BCD, we still have use cases that would benefit from having the kind of which-articles-link-to-which-specification-URLs data proposed here.

Along with this being useful for the HTML spec, it would also enable the Bikeshed and Respec tools to add MDN annotations to specs.

Given that Bikeshed and Respec are the tools used for generating the published versions for most all the current specs for the web platform, that’d mean we’d have most all specs with annotations linking back to MDN articles.