Making it Virtually Easy to Deploy on Day One

Posted by on March 13, 2012

At Etsy we have one hard and fast rule for new Engineers on their first day: deploy to production. We’ve talked a lot in the past about our deployment, metrics, and testing processes. But how does the development environment facilitate someone coming in on day one and contributing something that takes them through the steps of committing code, running it through our tests, and deploying it with deployinator?

A new engineer’s first task is to snap a photo using our in house photo booth (handmade of course) and upload it to the about page. Everyone gets a shiny new virtual machine with a working development version of the site, along with their LDAP credentials, github write access, and laptop. We use an internal cloud system for the VM’s, mostly because it was the most fun thing to build, but also gives us the advantage of our fast internal network and dedicated hardware. The goal is a consistent environment that mirrors production as closely as possible. So what is the simplest way to build something like this in house?

We went with a KVM/QEMU based solution which allows for native virtualization. As an example of how you may go about building an internal cloud, here’s a little bit about our hardware setup. The hypervisor runs on HP DL380 G7 servers that provide us with a total 72G RAM and 24 cores per machine. We provision 11 guests per server, which allows each VM 2 CPU cores, 5G RAM, and a 40G hard drive. Libvirt supports live migrations across non-shared storage (in QEMU 0.12.2+) with zero downtime which makes it easy to allocate and balance VM’s across hosts if adjustments need to be made throughout the pool.

We create CentOS based VM’s from a disk template that is maintained via Openstack Glance, which is a tool that provides services for discovering, registering, and retrieving virtual images. The most recent version of the disk images are kept in sync via glance, and exist locally on each server for use in the creation of a new VM. This is faster than trying to pull the image over the network on creation or building it from scratch using Kickstart like we do in production. The image itself may have been kickstarted to match our production baseline, and we template a few key files such as the network and hosts information which is substituted on creation, but in the end the template is just a disk image file that we copy and reuse.

The VM creation process involves pushing a button on an internal web page that executes a series of steps. Similar to our one button deployment system, this allows us to iterate on the underlying system without disruption to the overall process. The web form only requires a username which must be valid in LDAP so that the user can later login. From there the process is logged such that it that provides realtime feedback to the browser via websockets. The first thing that happens is we find a valid IP in the subnet range, and we use nsupdate to add the DNS information about the VM. We then make a copy of the disk template which serves as the new VM image and use virt-install to provision the new machine. Knife bootstrap is then kicked off which does the rest of the VM initialization using chef. Chef is responsible for getting the machine in a working state, configuring it so that it is running the same version of libraries and services as the other VM’s, and getting a checkout of the running website.

Chef is a really important part of managing all of the systems at Etsy, and we use chef environments to maintain similar cookbooks between development and production. It is extremely important that development does not drift from production in its configuration. It also makes it much easier to roll out new module dependencies or software version updates. The environment automatically stays in sync with the code and is a prime way to avoid strange bugs when moving changes from development to production. It allows for a good balance between us keeping things centralized, controlled, and in a known-state in addition to giving the developers flexibility over what they need to do.

At this point the virtual machine is functional, and the website on it can be loaded using the DNS hostname we just created. Our various tools can immediately be run from the new VM, such as the try server, which is a cluster of around 60 LXC based instances that spawn tests in parallel on your upcoming patch. Given this ability to modify and test the code easily, the only thing left is to overcome any fear of deployment by hopping in line and releasing those changes to the world. Engineers can be productive from day one due to our ability to quickly create a consistent environment to write code in.

Posted by on March 13, 2012
Category: engineering, infrastructure Tags: , , , , , ,

Related Posts

30 Comments

[…] At Etsy we have one hard and fast rule for new Engineers on their first day: deploy to production [codeascraft.etsy.com] 0 poäng | Postat mars 13 av Erik Starck […]

Awesome post, Thanks for shearing 🙂 .Btw.. how do you guys provisioning the lxc also with openstack, inside the VM’s?

    we’ll have another post on our LXC setup, we don’t currently use openstack for that

Cool, Looking forward, I wrote a basic ohai plugin for lxc, pushed upstream.. still need some spec/unit tests

Do you store the KVM guest on the local hard drive of each hypervisor host? If so do you notice any performance issues? Currently have a similar setup (Same server hardware, KVM and Chef) but using multipath-IO and having issues. Considering putting the guest images locally on the KVM host.

    yes we store them locally and no we don’t have any performance issues

The posts are excellent and insightful! Thank you.

Most of the content seems to be directed toward the client-facing tier (web/php). Do you have services/daemons that run in the middle tier used by multiple front-ends?

How do you handle database changes (sprocs, data, structure) that may contain breaking changes? Are the unit test for the DB changes?

    Our architecture is mostly non-service oriented, with the exception of search. For the most part we run a monolithic php app and thats how this is setup for us. But the services can be handled in much the same way.

    Our database changes are made in such a way that code works before and after the changes (no deletion of fields or field name changes). We roll these out behind config flags and ramp them up. We do write dbunit tests or tests that mock the ORM to test these types of changes.

[…] have a blog post describing how their infrastructure can scale and pump out new VMs so that newly hired employees can […]

What about developers getting their modified code into VMs? Supposedly they are not working with vim on the server so there must be some better, easy to use, way that doesn’t involve millions of commits.

    A lot of people do use vim or emacs on the server, but using a graphical IDE locally is also supported. We have a simple script that people can use to rsync code over or they can mount the filesystem locally. We try to support any method people are most comfortable with, and generally this isn’t a problem.

I noticed your mention of libvirt as a foundation for your VM setups. Would you like to add an entry to http://libvirt.org/apps.html that best describes how you use libvirt?

    I’d be happy to add to this, however it looks like all of these are actual open sourced apps and we haven’t released any of our stuff (yet). Let me know if I’m missing something!

I was informed of your usage of libvirt from this email:
https://www.redhat.com/archives/libvir-list/2012-April/msg00317.html

Are you interested in being listed on http://libvirt.org/apps.html as a client of libvirt? If so, could you please provide a summary to be included there?

If a developer has mounted the filesystem locally (over SMB?) how does that work with git? We’ve just migrated to using git (from cvs) but we’re having major problems with git on windows accessing dev servers over SMB.

    I can’t really say, nobody at Etsy is using windows, but people have had success with sshfs

Very interesting post; thanks for sharing.

Out of interest, what OS are you running on your hypervisor, and how is the networking set up? Are you bridging the guests on to the hypervisor’s physical NIC, or are you using NAT?

The reason I ask is that I’ve been attempting to set up multiple guests on a hypervisor running on Debian, and am running in to issues relating to multicast and IPv6. Specifically, I’d like my guests to have static IPv6 addresses configured. Unfortunately, it appears that bridged interfaces on Linux don’t implement multicast correctly; multicast frames transmitted from a bridge are forwarded to all members of the bridge, including the orignator. This causes IPv6 DAD to fail, so the statically configured IPv6 addresses are not used by the guests. This happens if I use VirtualBox or Qemu/KVM as the hypervisor. Running the same VirtualBox guests on a Windows 7 hypervisor results in correct behaviour.

Most frustrating. There is a Qemu bug filed: https://bugs.launchpad.net/qemu/+bug/761469

But since this happens with both VirtualBox and Qemu/KVM, I suspect this is a lower level problem, perhaps with bridgeutils.

    we use bridged, redhat based OS, IPv4 only so far

[…] Making it Virtually Easy to Deploy on Day One: At Etsy we have one hard and fast rule for new Engineers on their first day: deploy to production. […]

Hey John,
great Article – thanks for sharing!
I was wondering how do you handle the database side of things? Do developer work on the same database or does each developer has it’s own smaller dataset running in his vm?

Thanks again for sharing
Patrik

[…] ability to acclimate and impact change in your organization. Companies like Etsy actually have a hard-and-fast rule that all engineers should deploy to production on day […]

[…] This week, one of Etsy’s Staff Engineers is traveling to San Francisco to spend a week at Twitter, observing and helping out, learning what Twitter does particularly well, and seeing differences that may reinforce or refute beliefs we’ve held as core. Likewise, a Twitter Platform Engineer is traveling to Brooklyn for the week, and watching what Etsy does well and poorly, all while helping out (and, of course, deploying on her first day). […]

[…] also keen on getting developers to deploy on day one, which they do with some clever virtualisation http://codeascraft.etsy.com/2012…Embed QuoteComment Loading… • Share • Embed • 2m ago    Ryan Detzel, VP […]

[…] ability to acclimate and impact change in your organization. Companies like Etsy actually have a hard-and-fast rule that all engineers should deploy to production on day one. 2 – Assign Mentors- Lots of […]

[…] ability to acclimate and impact change in your organization. Companies like Etsy actually have a hard-and-fast rule that all engineers should deploy to production on day one. 2 – Assign Mentors- Lots of […]

[…] every engineer and designer also have their own VM to develop on the Etsy stack (see this post for details). This brings us to over 1000 hosts managed by Chef. They are all connected to one Chef […]

Hey! Which LDAP server do you use to store user credentials (or recommend to use) and is there any replication in place to improve availability?

[…] already have much tooling around making it easy to create a virtual machine for each developer, so it made sense to build our LXC virtualization tools into the same […]