Pontoon API - MUC 2018

stas · March 15, 2018, 11:53pm

Part of Pontoon meetup reports - MUC 2018.

Late last year I deployed the first iteration of Pontoon API to production. I set out to base the API design work on a series of use-cases submitted by the rest of the team. My goal was to keep the scope small and well-defined. The result of this work is a GraphQL endpoint which currently exposes aggregate statistics about projects and locales working on Pontoon.

As of late, the Pontoon API has been getting more attention because of Translate.Next. As we apply modern best practices to the development of Translate.Next, the API has become an important part of its design. At the same time, Translate.Next has become the single most important use-case driving the future API work. I plan to reshuffle the milestones listed on the wiki to accommodate this new priority.

GraphQL for Translate.Next

GraphQL is a great fit for Translate.Next. It excels at fetching just the data required by a particular front-end view. Developers can define precise sets of data fields which are required to render each view. The shape of the data is described using the GraphQL query language and the data is fetched in a single request to the API endpoint. This is in contrast to using REST APIs in which related objects must be fetched in subsequent requests by following hyperlinks (which often gets optimized via non-REST-ful ?include_fields=foo,bar parameters to the query). Or to tightly-coupled RPC APIs which expose special-cased endpoints tailored to the needs of the views calling into them.

The paradigm of describing the data required by a front-end view as a GraphQL query turns out to work really well with React components. Lifecycle methods make it easy to call the API and update the component’s state (or inject props) when the data reaches the client. I’m particularly impressed by Apollo, a GraphQL client for JavaScript front-ends, which effortlessly abstracts this pattern into a declarative decorator which takes a GraphQL query, as demonstrated by the following example from Apollo’s docs:

function TodoApp({ data: { todos } }) {
  return (
    <ul>
      {todos.map(({ id, text }) => (
        <li key={id}>{text}</li>
      ))}
    </ul>
  );
}

export default graphql(gql`
  query TodoAppQuery {
    todos {
      id
      text
    }
  }
`)(TodoApp);

I’m looking forward to learning more about Apollo. It might turn out to be a good solution for bridging the gap between Translate.Next’s front-end and the API.

Challenges on the Back-end

The ease of fetching data on the client side comes at a cost on the back-end. GraphQL doesn’t magically know how to get all the data from the database. Without taking special care, even a simple query might result in tons of database queries—hurting the performance of the entire app. As we plan to use the API for Translate.Next’s needs, the question of performance becomes pressing.

There are two big classes of performance problems which any GraphQL back-end has to mitigate:

Duplication of requests: the same record might be retrieved from the database multiple times if it’s related to more than one object retrieved by the query. For instance, a query might want to retrieve a list of locales working on Pontoon and list the active projects for each locale. With no optimization individual projects will be retrieved from the database multiple times—once for every locale.
The N+1 queries problem: in a naïve implementation, querying an object with a one-to-many relationship to other objects results in 1 database round-trip to retrieve the first object and then N additional queries to retrieve the objects related to it. This problem is usually solved by back-ends by using JOINs to reduce the number of database queries made. This turns out to be non-trivial with GraphQL.

In the initial iteration of the Pontoon API I was able to apply common Django ORM optimization techniques to address the issues listed above. By leveraging select_related and prefetch_related, queries made to the database can be optimized significantly. I wrote a helper which introspects the query to predict which relationships should be followed by the Django Queryset. This approach solved both of the above problems at once. It also allowed me to write simple guards protecting against cyclic queries. When the amount of data exposed by the API increases, we’ll try to factor these guards and optimizations out to make them scale.

I found out, however, that adding even the simplest new features to the GraphQL schema came with a high chance of making Django ORM bail out of Queryset optimizations. It looks like the laziness of Django Querysets might not be a good fit for the eagerness of GraphQL schema resolvers. I plan to examine this in detail in the coming weeks.

The GraphQL.js Experiment

In parallel, I’d like to run another experiment. I plan to write a minimal GraphQL schema in JavaScript which directly taps into Pontoon’s PostgreSQL database. GraphQL.js is the reference implementation of GraphQL and the JavaScript community has produced many useful tools for improving the performance of GraphQL queries. Two tools caught my attention:

Facebook’s own DataLoader aims to address the problem of duplicated requests. It provides a caching and batching layer which can be easily used in schema resolvers.
Join Monster parses GraphQL queries and outputs optimized SQL required to fetch the data described by them.

In order to do its magic Join Monster needs a schema describing the data in the database. Such schema would need to be maintained alongside Pontoon’s current Django Models. The Pontoon team agreed that it wouldn’t be a viable solution to maintain two independent schemas at this time: one in Python and one in JavaScript. For now, the value of trying out Join Monster (and GraphQL.js) is in offering an opportunity to better understand the limitations of our current Python implementation of the API.