What Hardware Powers Etsy.com?

Posted by on August 31, 2012

Traditionally, discussing hardware configurations when running a large website is something done inside private circles; and normally to discuss how vendor X did something very poorly, and vendor Y’s support sucks.

With the advent of the “cloud”, this has changed slightly. Suddenly people are talking about how big their instances are, and how many of them. And I think this is a great practice to get in to with physical servers in datacenters too. After all, none of this is intended to be some sort of competition; it’s about helping out people in similar situations as us, and broadcasting solutions that others may not know about… pretty much like everything else we post on this blog.

The great folk at 37signals started this trend recently by posting about their hardware configurations after attending Velocity conference… one of the aforementioned places where hardware gossiping will have taken place.

So, in the interest of continuing this trend here’s the classes of machine we use to power over $69.5 million of sales for our sellers in July

Database Class

As you may already know, we have quite a few pairs of MySQL machines to store our data, and as such we’re relying on them heavily for performance and (a certain amount of) reliability.

For any job that requires an all round performant box, with good storage, good processing power, and a good level of redundancy we utilise HP DL380 servers. These clock in at 2U of rack space, 2x 8 core Intel E5630 CPUs (@ 2.53ghz), 96GB of RAM (for that all important MySQL buffer cache) and 16x 15,000 RPM 146GB hard disks. This gives us the right balance of disk space to store user data, and spindles/RAM to retrieve it quickly enough. The machines have 4x 1gbit ethernet ports, but we only use one.

Why not SSDs?

We’re just starting to test our first round of database class machines with SSDs. Traditionally we’ve had other issues to solve first, such as getting the right balance of amount of user data (e.g. the amount of disk space used on a machine) vs the CPU and memory. However, as you’ll see in our other configs, we have plenty of SSDs throughout the infrastructure, so we certainly are going to give them a good testing for databases too.

A picture of our various types of hardware, with the HP to the left and web/utility boxes on the right

A picture of our various types of hardware, with the HP to the left/middle and web/utility boxes on the right

Web/Gearman Worker/Memcache/Utility/Job Class

This is a pretty wide catch all, but in general we try and favour as few machine classes as possible, so a lot of our tasks from handling web traffic (Apache/PHP) to any box that performs a task where there are many of them/redundancy is solved at the app level we generally use one type of machine. This way hardware re-use is promoted and machines can change roles quickly and easily. Having said that, there are some slightly different configurations in this category for components that are easy to change, e.g. amount of memory and disks.

We’re pretty much in love with this 2U Supermicro chassis which allows for 4x nodes that share two power supplies and 12 3.5″ disks on the front of the chassis

Supermicro Chassis with 4 easily serviable nodes

Supermicro Chassis with 4 easily serviceable nodes

A general configuration for these would be 2x 8 core Intel E5620 CPUs (@ 2.40ghz), 12GB-96GB of RAM, and either a 600GB 7200pm hard disk or an Intel 160GB SSD.

Note the lack of RAID on these configurations; We’re pretty heavily reliant on Cobbler and Chef, which means rebuilding a system from scratch takes just 10 minutes. In our view, why power two drives when our datacenter staff can replace the drive and rebuild the machine and have it back in production in under 20 minutes? Obviously this only works where it is appropriate; clusters of machines where the data on each individual machine is not important. Web servers, for example, have no important data since logs are sent constantly to our centralised logging host, and the web code is easily deployed back on to the machine.

We have Nagios checks to let us know when the filesystem becomes un-writeable (and SMART checks also), so we know when a machine needs a new disk.

Each machine has 2x 1gbit ethernet ports, in this case we’re only using one.

Hadoop

In the last 12 months we’ve been working on building up our Hadoop cluster, and after evaluating a few hardware configurations ended up with a very similar chassis design to the one used above. However, we’re using a chassis with 24x 2.5″ disk slots on the front, instead of the 12x 3.5″ design used above.

Hadoop nodes... and a lot of disk lights

Hadoop nodes… and a lot of disk lights

Each node (with 4 in a 2U chassis) has 2x 12 core Intel E5646 CPUs (@ 2.40ghz), 96GB of RAM, and 6x 1Tb 2.5″ 7200rpm disks. That’s 96 cores, 384GB of RAM and 24TB per 2U of rack space.

Our Hadoop jobs are very CPU heavy, and storage and disk throughput is less of an issue hence the small amount of disk space per node. If we had more I/O and storage requirements, we had also considered 2U Supermicro servers with 12x 3.5″ disks per node instead.

As with the above chassis, each node as 2x 1gbit ethernet ports, but we’re only utilising one at the minute.

The difference in power usage on one power strip showing the difference between jobs running and not

This graph illustrates the power usage on one set of machines showing the difference between Hadoop jobs running and not

Search/Solr

Just a month ago, this would’ve been grouped into the general utility boxes above, but we’ve got something new and exciting for our search stack. Using the same chassis as in our general example, but this time using the awesome new Sandy Bridge line of Intel CPUs. We’ve got 2x 16 core Intel E5-2690 CPUs in these nodes, clocked at 2.90ghz, which gives us machines that can handle over 4 times the workload of the generic nodes above, whilst using the same density configuration and not that much more power. That’s 128x 2.9ghz CPU cores per 2U (granted, that includes HyperThreading).

This works so well because search is really CPU bound; we’ve been using SSDs to get around I/O issues in these machines for a few years now. The nodes have 96GB of RAM and a single 800GB SSD for the indexes. This follows the same pattern of not bothering with RAID; The SSD is perfectly fast enough on it’s own, and we have BitTorrent index distribution which means getting the indexes to the machine is super fast.

Less machines = less to manage, less power, and less space.

Output of the "top" command with 32 cores

Output of the “top” command with 32 cores on Sandy Bridge architecture

Backups

Supermicro wins this game too. We’re using the catchily named 6047R-E1R36N. The 36 in this model number is the important part… this is a 4u chassis, with 36x 3.5″ disks. We load up these chassis with 2TB 7200rpm drives, which when coupled with an LSI RAID controller with 1gb of battery backed write back cache gives a blistering 1.2 gigabytes/second sequential write throughput and a total of 60TB of usable disk space across two RAID6 volumes.

36 disk Supermicro chassis

36 disk Supermicro chassis. Note the disks crammed into the back of the chassis as well as the front!

Why two RAID6 volumes? Well, it means a little more waste (4 drives for parity instead of 2) but as a result of that you do get a bit more resiliency against losing a number of drives, and rebuild times are halved if you just lose a single drive. Obviously RAID monitoring is pretty important, and we have checks for either SMART (single disk machines) or the various RAID utilities on all our other machines in Nagios.

In this case we’re taking advantage of the 2x 1gbit ethernet connections, bonded together to the switch to give us redundancy and the extra bandwidth we need. In the future we may even run fiber to these machines, to get the full potential out of the disks, but right now we don’t get above a gigabit/second for all our backups.

Special cases

Of course there are always exception to the rules. The only other hardware profile we have is HP DL360 servers (1u, 4x 2.5″ 15,000rpm 146GB SAS disks) which is for roles that don’t need much horsepower, but we deem important enough to have RAID. For example, DNS servers, LDAP servers, and our Hadoop Namenodes are all machines that don’t require much disk space, but need RAID for extra data safety than our regular single disk configurations. 

Networking

I didn’t go into too much detail on the networking side of things in this post. Consider this part 1, and watch this space for our networking gurus to take you through our packet shuffling infrastructure at a later date.

Continue the trend

If you’re anything like us, we love a good spot of hardware porn. What cool stuff do you have?

 

This post was Laurie Denness (@lozzd), who would love it if you came and helped us make Etsy.com even better using this hardware. Why not come and help us?

Posted by on August 31, 2012
Category: engineering, infrastructure, operations

36 Comments

[…] What Hardware Powers Etsy.com? (Code as Craft) […]

[…] What Hardware Powers Etsy.com?, was in part inspired by 37Signals Behind the Scenes: The Hardware that Powers Basecamp, Campfire, […]

Thanks for the post guys! It’s always cool to see other configs out there and how a specific problem has been solved.

I’m wondering why you are going with a full-on hardware stack though, rather than virtualization for at least certain tasks such as web-service? We substantially decreased our own hardware requirements while adding more and more overall resources by moving toward a VM stack. This would especially be useful for the web servers where they generally (at least in our experience) are memory limited, rather than CPU bound tasks… which is something virtualization is very very good at.

Perhaps something to think about in the future, especially since you utilize Chef!

    In our case, our web serving is actually 100% CPU bound. We only have 12-24GB of RAM in our webservers, and this is plenty for the number of Apache children we run to be able to max out our CPUs.
    We do have some virtualisation.. For example, Jenkins slaves and developer VMs, but that’s probably a story for another post 🙂 Most of the things we run require the full demands of the hardware though.

This is awesome, what OS do you guys use?

    We’re using CentOS.. A mixture of 5 and 6 right now, trying to get to 6 🙂

Thanks for the writeup! We just ordered the 24 bay Supermicro chassis to test for similar purposes. Which LSI controller are you using and are you using a single controller? We ended up going with the packaged Supermicro server bundle which comes with a LSI 2108 chipset controller.

    Yeah, it’s a single controller, also showing as an LSI 2108 chipset. Seems to work fairly well, just remember to get the optional battery backup unit if you want super fast speeds with the write back cache!

do you have some specifics on the backup server? What raid controller/cpu/hdd model?

Sounds like you guys really have your act together on the hardware/infrastructure side – now what’s your plan for getting IPv6 out there for the world to use (and to help you converge your addressing plans/management/etc)?

    We’ll discuss that in the second part, which will be all about our network infrastructure. Hopefully we’ll get that written up soon 🙂

Awesome post. Looking forward to the networking post as well.

What about hardware costs? Compared to current cloud providers. Also, any data center failover?

[…] — last week, it gave an in-depth explanation of a few recent outages — and on Friday it shared the details of the hardware architecture that powers its popular business. Etsy isn’t Facebook in terms of scale or specialization, […]

[…] lately — last week, it gave an in-depth explanation of a few recent outages — and on Friday it shared the details of the hardware architecture that powers its popular business. Etsy isn’t Facebook in terms of scale or specialization, but […]

Do you push the backup data out to something else (tapes, whatever) or is the 60 TB enough to hold all your data? Do you do the usual incr/diff/full-split to get multiple restore points?

The 1.2GB/s number for raid controller throughput looks a lot like what we get, so it’s nice to know we’re not alone in that 🙂

Thanks for posting this! Any chance I could get you to throw your /var/lib/cobbler/kickstarts/* and /var/lib/cobbler/snippets/* up in a github repo? Or even just a few selected gists of how you’ve customized. Pretty please 🙂

[…] What Hardware Powers Etsy […]

I’m actually looking forward to the software part of this – e.g. which web server, database server, and scripting language powers etsy.

Also, this website receives approximately 1 million uniques/day – so optimization at the software level must have been very challenging – looking forward to a post on optimization as well!

Not sure if you guys take “requests” here but one thing I’ve been curious about is how the setup you guys have now differs from the vision of when you started.

What I mean by that was did you guys set out right from the beginning to build such a large a sharded, replicated distributed system, and what you have now is just a scaled-up version of that, or has it evolved significantly in design since the early days.

Would be very interesting to hear about …
Nick

Under the heading ‘Search/Solr’ you don’t actually write anything at all about using Solr 😉 The article would probably benefit from establishing that section, I’d certainly be interested because we use Solr too.

[…] 9:50 – Etsy talks about its unique hardware setup […]

Before you go on to networking, it would be good to hear

1) Which vendor do you use for supermicro? Do you assemble barebones systems? Do you manage repairs in-house?

2) How does power usage for supermicro compare to name-brand gear?

[…] A neat post about the hardware Etsy uses to run their site. […]

[…] What hardware powers Etsy.com? Traditionally, discussing hardware configurations when running a large website is something done inside private circles; and normally to discuss how vendor X did something very poorly, and vendor Y’s support sucks. With the advent of the “cloud”, this has changed slightly. Suddenly people are talking about how big their instances are, and how many of them. And I think this is a great practice to get in to with physical servers in datacenters too. After all, none of this is intended to be some sort of competition; it’s about helping out people in similar situations as us, and broadcasting solutions that others may not know about… pretty much like everything else we post on this blog. […]

[…] their website fast and more.* GitHub shares their strategy for keeping their website fast.* Etsy discusses the hardware behind their site.* The New York Times looks at why waiting is torture.* Joshua Bixby explains what […]

[…] while back they posted the specs of all their gear. Very […]

Have you considered building your own ‘Backblaze Storage Pod’ for backups or one of those SeaMicro SM10000 servers with 512 cpus?

Hey,

Thanks for another really interesting and open write up, really great stuff to read.

I’m very interested in what backup software you guys use to to manage the movement of data your disk based solution?

– Jamie

great post — thanks for sharing! I have a few questions @ your backups…what software stack are you using for your backups? Also, what file system are you using?

thanks!

Do you allocate swap space, and what’s the recommended size? Someone told me that swap is useless on big-mem server nowdays, however, I am not sure whether it is still common practice to allocate double-the-memory-size disk space.

    kc, if your server has 64GB of ram and you “double-the-memory-size” that means your apps would need to try to consume 192GB of RAM to utilize all of swap and all of ram. If you are ever in that situation, you have severly misconfigured servers. 🙂

    Back in sub-2GB days that was somewhat more realistic scenario, but I think on servers with 16GB+ it’s unnecessary.
    My(not etsy) 2cents.

    As Mxx says, that “double the RAM” rule is severely broken nowadays. If you have an app gone wrong and it starts to chew up swap, your server is going to be basically unusable until it’s all used up, and on spinning disks at 192GB or bigger, that’s quite some time. You may as well just reboot the box at that point.
    We basically allocate 2GB on every server, regardless of amount of RAM, since having none is still inadvisable (at some point there were flawed formula in the kernel that meant you could never allocate all your physical RAM if swap == 0, not sure if they still exist…) although we aim to never use a single byte of it.

[…] should probably collect some real data on this as an industry and share it around; I’ve always been of the mindset that we’re weirdly secretive sometimes of what hardware/software we use but we should share, […]