MongoDB at Etsy

Posted by on May 19, 2010

Hi! Dan McKinley and Wil Stuckey from the Etsy Curation team here. We’ll be your hosts for a three-part series about the use of MongoDB here at Etsy.

The Curation Team
The Curation Team. Well, half of it. (Photo credit: Elizabeth Weinberg.)

In this, the first entry, we’ll give some background on how and why we use MongoDB, and explain our initial impressions as developers. In the second post, John Allspaw will talk about how well MongoDB is working out operationally. And not to build this up excessively or give too much away, but the third post will be a harrowing look inside web operations gone horribly awry, chock full of fear, loathing, and remediation checklists.

The Application

The first project using MongoDB is the new Treasury. For the unfamiliar, the Treasury is a member-curated browsing tool, originally built as a flash application. Our team has rewritten it as a modern ajax application in order to solve the scaling issues inherent in the original FMS design.

Here’s a G-rated example of a Treasury:

A Treasury on Etsy
A treasury on Etsy (click to see the live version).

Etsy homepages are most frequently chosen from the treasuries, and there are quite a few other existing or proposed features that are at least vaguely similar. So we wanted our backend to be flexible and to make something like a polymorphic, generalized “list” object as easy as possible. It was thinking along those lines that first made us consider using a schemaless database, although we would not currently consider that to be the primary benefit.

Why not use a relational database?

For us, building a read-heavy social application, this question should really be rephrased as “why not use MySQL?” This was actually a pretty difficult decision. MySQL is very well-understood operationally (especially by the people on our team). Replication in MySQL is pretty easy, and we love replication.

Treasury response time before and after enabling reads from replicated slaves.
Treasury response time just before and just after enabling reads from replicated slaves.

There is absolutely no part of this project that is technically impossible with MySQL or, for that matter, any relational database. For us, this just came down to development speed. Ignoring everything else, the following two solutions are roughly equivalent in terms of performance.

Relational Solution Document Store Solution
  • Use a relational database, with a normalized or semi-normalized schema.
  • When rendering a response, run a handful of queries and then aggregate the data for the object.
  • Cache the resultant aggregate object either on a TTL or do invalidation.
  • Return the cached copy of the aggregate object.
  • Use a document datastore, and embed sub-objects or child lists within their parents.
  • When rendering a response, retrieve the document by key and return it.

In our case, the development time saved using a document database is worth the risks. Caching at many levels is of course still a part of our application, but so far we’ve not found any reason to cache a single MongoDB document retrieved by primary key in an external cache like memcached, a practice that is currently common for us when we use relational databases.

Why MongoDB?

The number of databases that could be used for this kind of project have, um, proliferated somewhat recently. Why would we choose MongoDB over all of the others? Well, there were a few characteristics that we knew that we wanted:

Our tests found MongoDB to be a sweet spot between reliability, speed, and maturity. But to be clear, this was the picture six months ago when we started prototyping. The world of document datastores since then has changed significantly even in that time frame. Today, the choice would probably be more difficult.

And to be perfectly honest, the proximity of 10gen to Etsy’s Brooklyn headquarters as well as the responsiveness of Eliot and his team to questions was also a factor in our decision. (In the interest of full disclosure: 10gen shares investors with Etsy.)

Stay Tuned

In upcoming posts, we’ll dive deeper into our production experience with MongoDB. In short, things are going well, but we have learned some lessons the hard way. Hopefully, we can help you avoid making the same mistakes. Next time, John Allspaw will talk about MongoDB from the perspective of an operations professional. See you then!

Posted by on May 19, 2010
Category: data, databases, infrastructure, operations Tags: , ,

11 Comments

Is a rugged beard required to be an etsy developer?

I would be very interested in seeing a talk submission to the Strange Loop conference in St. Louis Oct 14-15 about Etsy and their use of MongoDB.

http://strangeloop2010.com/pages/38735

I look forward to the rest of the posts about MongoDB too. I’ve started prototyping with it at Craigslist to replace one of our MySQL clusters and it’s going pretty well so far. I especially love having good Perl client libraries. ;-)

Jeremy, if you have any questions feel free to let us know! It has been an interesting ride so far with Mongodb.

- Chris Munns
Etsy Sys-Ops Team.

How have you managed MongoDb’s approach to single-server durability? Have you found this an issue at all? Do you simply rely on replication to handle it and find it “good enough”?

@Toby – we replicate and use a virtual IP to track the master. If master dies we restart a slave as the new master. Membership of the nodes in the master or slave VIPs is determined automatically via a health check. We will consider switching to replica sets when they’re available.

So yeah, that is basically what we do. Allspaw might have more to say about this in his post.

Hey, that interesting. I’m using mongo on Craft Cult basically just for the fun of it. Mongo seemed to stand out from among the other options for a few reasons. I’ll be looking forward to the next couple of articles!

I have a db design question.

I assume your items are in a collection. And comments are an embedded document inside that collection?

If so, how would you extract a list of comments made by a particular user?

Or, as a general question, how do you query embedded documents across all documents in a collection?

I find many web applications are very relational.

Denormalising a relational document solves a lot of issues, but I’m not sure Mongo’s approach of piling everything into one big doc is usefull for many current web schemas.

For example, in a forum application, a user can make many posts in many topics….. if each topic owns their own posts, this takes the querying power away from the user objects…. so it is difficult to present a list of posts made by a user across the whole forum.

Just my 2c

Anyone got a response for @Jason ? I’d quite like to know if there is a performant workaround for this?

[...] kinds of stuff to it easily. It’s currently used for many different things such as the API, Lists service, internal admin-only tools and others. Having a single deployment process has removed a lot of [...]