Two Sides For Salvation

Posted by on April 20, 2012

Note: In 2020 we updated this post to adopt more inclusive language. Going forward, we’ll use “primary/replica” in our Code as Craft entries.

How do you make changes to your database’s structure that’s getting hammered 24×7 without any disruption? If you use Oracle and paid millions for it, it’s built in. If you use Mysql, it’s one of the holy grails of database operations, and one we’ve learned to do here at Etsy.

We have a sharded architecture, which means data is scattered across several “shards”. Each shard has different data than all others. Each shard is a primary-primary pair. MM pairs are primaries and replicas at the same time. They not only give you fault tolerance, they divide the read and write load between them that’s impossible to do in the common primary-replica(s) setup. MM pairs have their own set of challenges.

Don’t Let Your Database Generate Anything

The main problem with MM pairs is caused by non-deterministic values generated by the database engine itself, such as autoincrement fields, random numbers and timestamps. The solution to that is that we don’t let the DB generate anything. Every value inserted/modified always comes from the application. This allows us to write to either of the two sides of a MM pair knowing it will get replicated to the other side correctly. I’ve heard that MM pairs don’t make sense since you’re executing everything twice. It’s true that you are executing everything twice, but you’re doing it already if you’re using a primary-replica(s) setup, and the benefits that come from MM pairs are huge. In addition to giving you fault tolerance and load balancing, they are the key to being able to do non-disruptive, live schema changes.

The other part of the puzzle is our shard-aware software layer: our in-house built ORM. It does many different things, but for our current topic, it finds where in our shards a particular object’s data lives. Whenever we need to access the data for an object, the ORM first goes to one of two “index” servers we have, then go to the shard that has the needed data. These index servers are also a MM pair. Index servers get a very large amount of queries, but they are all extremely fast, all in the order of 10-100 microseconds. It’s common for sharded architectures not to have an index server. You simply decide on a sharding scheme when you start, say by user id, then divide the data among your shards knowing where ranges of users live. Everything works great until the number of users on a shard grows beyond what one shard can handle, and by then you’re already in trouble. By having an index server, we can move data between shards and simply update the index to point to the new location.

Our ORM reads a configuration file when it starts, that among other things, contains the list of shard servers available to it. We can add shards as needed with time and add them to the configuration file to start writing data to them, also migrating users so new shards are not idle at first and to balance the load among all shards.

The kicker: when we do schema changes, we take out one server from each of the MM pairs from the configuration file and gracefully restart the application. The ORM re-reads its configuration and knows only about the active shard sides. This leaves the application running on half of our database servers. Nobody notices. We immediately see in our many graphs that one side’s traffic plummets and the other side is taking all the load.

Note that half of the servers does not mean half of the data. All data lives on both sides of a MM pair. Replication is still going both ways, we never break it. The active side simply stops getting inserts/updates/deletes from the inactive side because nothing is connecting to it. But the inactive side still gets inserts/updates/deletes from the active side since it’s still a replica. We could break replication for the ALTERs, but there’s no benefit in doing so and adds an unnecessary step (with the one exception of the session we’re actively doing ALTERs in. We don’t want those to replicate.)

At this point we are ready to make as many changes as we need on the inactive side. In Mysql terms, ALTERs. These ALTERs can take anywhere from minutes to hours to complete and lock the tables they are modifying, but we’re operating on the inactive side and definitely don’t want any of our work to replicate to the other side, so we prepend ALTERs with SET SQL_LOG_BIN=0.

When these alters are done, they have been applied to the inactive side only. Another change in the config file places these servers back into active mode. We wait for load to stabilize between both sides, replication to catch up if it has lagged behind, then we’re ready to repeat for the side that hasn’t been ALTER’ed.

Taking sides out of production is not only useful for schema changes, but for upgrades, configuration changes, and any other necessary downtime.

So this is all great, works well for us. We routinely do schema changes with no user impact. But what if you don’t have an ORM? Mysql Proxy may be your answer. It’s very simple to have web servers connect to a pool of available backend database servers with Mysql Proxy. You can read the documentation for it at Mysql’s website. An important feature of Mysql Proxy is that it allows you to change configuration on-the-fly, so you can take servers in and out without even having to stop or reload your application.

MM pairs have had a bad reputation of being quirky. They can be, but as long as you don’t let your database generate anything, they work. When you need to do frequent schema changes in a 24×7 environment, they are key to no-downtime schema changes.

If you want more details on our database architecture, you can also check here.

Posted by on April 20, 2012
Category: databases, infrastructure, operations


How does replication work between two MM pairs with different schemas (after you bring the inactive side back up)?

Interesting write up. Having a lot of experience on standard master/slave architectures, it is interesting to hear the pros and cons of an MM layout from somebody that is using it for something real.

It would be interesting to hear about your process of creating unique table ids across your machines if you are not depending on mysql’s auto incrementing feature. There are probably as many solutions to this problem as there are programmers in the world, but it’s always good to hear about the ones that, you know….work.

Christopher, check my colleague’s talk slides and you’ll see how we create globally unique id’s:

Justincwatt, it doesn’t cause a problem because we don’t use any foreign keys. All data relationships are handled in the application.

@Justincwatt: You just have to make sure that your replication doesn’t break *given your incoming updates* assuming you’re using statement replication. Adding a field is pretty simple. Removing one is easy too as long as you aren’t trying to update it (hopefully a safe bet:)).

Akachler, I guess I’m not sure what foreign keys have to do with it. If you drop a column on the inactive side, wouldn’t MySQL replication barf if that column got updated on the active side in the interim? Or do you anticipate that by updating the code in advance of the db change to prevent changes to soon-to-be-dropped columns?

Justincwatt, pretty much what Justinjhuff said. We don’t use the changed schema until a few days later, way after the new schema is in place. But you definitely have to be careful regarding fields. Replication will not break if you don’t reference any nonexisting fields and use statement-based repl.

Justinjhuff and Akachler, thanks, yeah, totally makes sense now. I suppose if your code safely avoids the columns that are out of sync, replication should happily chug along. Cool.

Interesting article but what I find more interesting is my wife is always complaining about issues with the Etsy site and showing them to me. You guys have outages almost nightly, and at least weekly!

So it’s nice to write articles about the uber kung fu you guys have, but it’s a let down to learn its all smoke and mirrors.

user101: Sorry you’re having trouble. We don’t have uber kung fu, we’re just trying to share with things that we’ve learned.

The goal isn’t smoke and mirrors. I’m surprised to hear that you’re having issues daily/weekly, this isn’t the case with availability and performance at the moment.

One of the reasons why we’re looking to share what we’ve learned here is to actually be transparent. Since you posted as an anonymous user101, I can’t help you have a better experience.

So we’d love to help if we had some information about your usernames or shop names. If you get a message into Etsy support mentioning that we had this exchange on CodeAsCraft, then maybe we can chase down the reasons for your trouble.

I’ll keep an eye out for your report!

[…] Etsy: Deployments using Master-Master MySQL […]

[…] our sharded master-master pair database architecture and how that and the Etsy ORM allows us to perform schema changes while keeping the site up. That we can do this at all is really awesome but to actually do this is still hard, risky and […]

Can’t give enough praise for sharing information like this guys. Unbelievably valuable insight.