MongoDB at Etsy

Posted by Dan McKinley | Filed under data, databases, infrastructure, operations

Hi! Dan McKinley and Wil Stuckey from the Etsy Curation team here. We’ll be your hosts for a three-part series about the use of MongoDB here at Etsy.

The Curation Team
The Curation Team. Well, half of it. (Photo credit: Elizabeth Weinberg.)

In this, the first entry, we’ll give some background on how and why we use MongoDB, and explain our initial impressions as developers. In the second post, John Allspaw will talk about how well MongoDB is working out operationally. And not to build this up excessively or give too much away, but the third post will be a harrowing look inside web operations gone horribly awry, chock full of fear, loathing, and remediation checklists.

The Application

The first project using MongoDB is the new Treasury. For the unfamiliar, the Treasury is a member-curated browsing tool, originally built as a flash application. Our team has rewritten it as a modern ajax application in order to solve the scaling issues inherent in the original FMS design.

Here’s a G-rated example of a Treasury:

A Treasury on Etsy
A treasury on Etsy (click to see the live version).

Etsy homepages are most frequently chosen from the treasuries, and there are quite a few other existing or proposed features that are at least vaguely similar. So we wanted our backend to be flexible and to make something like a polymorphic, generalized “list” object as easy as possible. It was thinking along those lines that first made us consider using a schemaless database, although we would not currently consider that to be the primary benefit.

Why not use a relational database?

For us, building a read-heavy social application, this question should really be rephrased as “why not use MySQL?” This was actually a pretty difficult decision. MySQL is very well-understood operationally (especially by the people on our team). Replication in MySQL is pretty easy, and we love replication.

Treasury response time before and after enabling reads from replicated slaves.
Treasury response time just before and just after enabling reads from replicated slaves.

There is absolutely no part of this project that is technically impossible with MySQL or, for that matter, any relational database. For us, this just came down to development speed. Ignoring everything else, the following two solutions are roughly equivalent in terms of performance.

Relational Solution Document Store Solution
  • Use a relational database, with a normalized or semi-normalized schema.
  • When rendering a response, run a handful of queries and then aggregate the data for the object.
  • Cache the resultant aggregate object either on a TTL or do invalidation.
  • Return the cached copy of the aggregate object.
  • Use a document datastore, and embed sub-objects or child lists within their parents.
  • When rendering a response, retrieve the document by key and return it.

In our case, the development time saved using a document database is worth the risks. Caching at many levels is of course still a part of our application, but so far we’ve not found any reason to cache a single MongoDB document retrieved by primary key in an external cache like memcached, a practice that is currently common for us when we use relational databases.

Why MongoDB?

The number of databases that could be used for this kind of project have, um, proliferated somewhat recently. Why would we choose MongoDB over all of the others? Well, there were a few characteristics that we knew that we wanted:

  • The database should be safe to use as the system of record. In other words, it will not be storing data that is essentially replicated from other locations. We need the data on disk, backed up, and to have reasonable operational guarantees when there are hardware failures or the process is killed. This requirement also imposes certain constraints on the database’s maturity–we had to rule out CouchDB because of the possibility that the storage format would change before it came out of alpha.
  • The database performance should degrade gracefully when the data volume exceeds available RAM (this rules out some contenders, such as Tokyo Cabinet).

Our tests found MongoDB to be a sweet spot between reliability, speed, and maturity. But to be clear, this was the picture six months ago when we started prototyping. The world of document datastores since then has changed significantly even in that time frame. Today, the choice would probably be more difficult.

And to be perfectly honest, the proximity of 10gen to Etsy’s Brooklyn headquarters as well as the responsiveness of Eliot and his team to questions was also a factor in our decision. (In the interest of full disclosure: 10gen shares investors with Etsy.)

Stay Tuned

In upcoming posts, we’ll dive deeper into our production experience with MongoDB. In short, things are going well, but we have learned some lessons the hard way. Hopefully, we can help you avoid making the same mistakes. Next time, John Allspaw will talk about MongoDB from the perspective of an operations professional. See you then!


11 responses to MongoDB at Etsy

  • Katie Danger says:

    Is a rugged beard required to be an etsy developer?

  • Alex Miller says:

    I would be very interested in seeing a talk submission to the Strange Loop conference in St. Louis Oct 14-15 about Etsy and their use of MongoDB.

    http://strangeloop2010.com/pages/38735

  • I look forward to the rest of the posts about MongoDB too. I’ve started prototyping with it at Craigslist to replace one of our MySQL clusters and it’s going pretty well so far. I especially love having good Perl client libraries. ;-)

  • Chris Munns says:

    Jeremy, if you have any questions feel free to let us know! It has been an interesting ride so far with Mongodb.

    – Chris Munns
    Etsy Sys-Ops Team.

  • Toby Hede says:

    How have you managed MongoDb’s approach to single-server durability? Have you found this an issue at all? Do you simply rely on replication to handle it and find it “good enough”?

  • Dan McKinley says:

    @Toby – we replicate and use a virtual IP to track the master. If master dies we restart a slave as the new master. Membership of the nodes in the master or slave VIPs is determined automatically via a health check. We will consider switching to replica sets when they’re available.

    So yeah, that is basically what we do. Allspaw might have more to say about this in his post.

  • julian says:

    Hey, that interesting. I’m using mongo on Craft Cult basically just for the fun of it. Mongo seemed to stand out from among the other options for a few reasons. I’ll be looking forward to the next couple of articles!

  • jason says:

    I have a db design question.

    I assume your items are in a collection. And comments are an embedded document inside that collection?

    If so, how would you extract a list of comments made by a particular user?

    Or, as a general question, how do you query embedded documents across all documents in a collection?

    I find many web applications are very relational.

    Denormalising a relational document solves a lot of issues, but I’m not sure Mongo’s approach of piling everything into one big doc is usefull for many current web schemas.

    For example, in a forum application, a user can make many posts in many topics….. if each topic owns their own posts, this takes the querying power away from the user objects…. so it is difficult to present a list of posts made by a user across the whole forum.

    Just my 2c

  • Arush says:

    Anyone got a response for @Jason ? I’d quite like to know if there is a performant workaround for this?

  • [...] kinds of stuff to it easily. It’s currently used for many different things such as the API, Lists service, internal admin-only tools and others. Having a single deployment process has removed a lot of [...]

  • Leave a Response

    Recent Posts

    About

    Etsy At Etsy, our mission is to enable people to make a living making things, and to reconnect makers with buyers. The engineers who make Etsy make our living with a craft we love: software. This is where we'll write about our craft and our collective experience building and running the world's most vibrant handmade marketplace.

    Code as Craft is proudly powered by WordPress.com VIP and the SubtleFlux theme.

    © Copyright 2014 Etsy