Localization communities indicators - Programmatic access to /dashboards/revisions

I would like to build stats around activity for different localization communities in order to be able to get something like:

  • number of distinct users who contributed for locale X in the last 1, 6, 12, 24 months
  • “top” users (nb of edits as a basic metric) for locale X for the last 1, 6, 12, 24 months
    (or even build some graphs around those)
    N.B. those numbers are not intended to be vanity metrics but rather tools to help identify “at risk” locales. In other words, I don’t care “who”, I care about “how many people/how much activity”

I tried to crawl the revision dashboard with different parameters but without being logged, the view is not accessible.

Hence my questions:

  • is it possible to authenticate client-side without a web browser to programmatically crawl it (e.g. from a Node.js script)? What would be the ratelimit and the other conditions to access?
  • is it possible to gather such information in any other way?

If you think filing a bug or taking another route is preferable, please tell me.

Examples of equivalent views for other tools/l10n around Mozilla projects:

2 Likes

@hmitsch - is this something participation systems would be interested in generating?

1 Like

Hi @jwhitlock,

thanks for looping me in. When you say “interested in generating”, do you refer to building the API/view which could be crawled?

PS: CC’ing @nukeador as he does similar work to what @sphinx_knight describes in other parts of Mozilla.

Best regards,
Henrik

We recently added the user login requirement to the revision dashboard, because the database load of scrapers was causing issues that impacted “regular” visitors. It would have taken time to make this view less resource intensive. Because we’re severely resource constrained and have an aggressive schedule for 2019, we went with the quick option of requiring a login, restricting it to use in a browser. I think we’d make a similar decision around adding contributor stats as an MDN feature - it would have to get in line with the rest of the 2019 deliverables, and probably wouldn’t make the cut.

Participation systems recently asked for a anonymized copy of the database for a yearly report. This database could be used to generate the requested report, and wouldn’t require a new MDN feature. It doesn’t help with generating this data every month, but I’m guessing @sphinx_knight would get value from a one-time report.

1 Like

Being able to build this data yearly or every 6 months is sufficient, yes.
(to me) It is intended to be a decision tool to keep/remove a locale and I don’t think this kind operation would/should occur on a monthly basis anyway.

If it is possible to have access to the anonymized database (with revisions info), I don’t mind at all building the data on my own :slight_smile:

I think a periodic report generated by Mozilla would be preferable to sharing the anonymized database with everyone who wants to run a query.

1 Like

I’m currently helping the SUMO team to build a localization strategy and also facing similar problems to extract info.

My approach has been working with vendors to analyze a db dump and a bit of manual scrapping.

If there is a conversation about localization and MDN I would be extremely interested in having a chat with the person leading that. I have already a lot of insights and recommendations from my work with SUMO that I think would translate well for MDN as another documentation platform.

Shoot me an email and we can set up a call.

Thanks!

1 Like