Migrating databases with minimal downtime

coop · November 12, 2020, 9:31pm

Hi! I’m Coop, and I manage the Services Engineering team here at Mozilla. We’re responsible for a number of different Mozilla services, but our primary responsibility right now is Taskcluster.

Over the past year, we migrated the back-end data store for Taskcluster from Azure entities to Postgres. We did this for two main reasons:

Entities are opaque blobs and need to be introspected for comparisons and relations. These operations are expensive and limit the kinds of real-time queries we can do.
Azure is yet-another-cloud service required to run Taskcluster, with associated costs and administrative overhead. By removing Azure from the picture, we’ve made Taskcluster easier to setup, manage, and possibly even adopt by third-parties.

Although the Services Engineering team is no stranger to migrations, this particular migration was a little different. We needed to ensure business continuity while manipulating data between multiple iterations of an evolving database schema. @dmitchell and @helfi92 did the hard work of planning, organizing, and leading the sprints required to help us achieve this difficult task.

Now that we have successfully migrated our data store, @dmitchell is writing a series of blog posts about what we learned and how we can apply that knowledge to future migrations. We even have some tooling/approaches that might be useful to you if you happen to need to migrate large amounts of data between database schemas (or even database platforms) but can’t afford to take significant downtime.

The first two blog posts in the series are here:

The third post is coming soon. We’ll add a link when it’s ready.

Enjoy!