Scaling CI at Etsy: Divide and Concur, Revisited

Posted by on March 12, 2012

In a past post, Divide and Concur, we told you how we approached dividing our large test suite into smaller test suites by keeping similar tests together rather than arbitrarily dividing.

Dividing tests by common points of error made triaging failures systemic failures quick, and enticed everyone to write faster, more deterministic tests, but not all was perfect. Our Jenkins dashboard was quite verbose.

The numerous jobs on our dashboard were great for pinpointing where the failures were, but it was difficult to determine at which stage of the deploy pipeline the failures existed. Some tests were executed on every commit. Some tests were executed when the QA button was pushed. Some tests were executed against a freshly pushed Princess or Production build. We were using the Jenkins IRC plugin, and the number of messages per hour was drowning out necessary communication in the #push channel.

We needed some way to communicate the test status at each stage of the deployment pipeline.

We considered using Downstream Jobs, but fingerprinting was awkward and difficult to set up, and all-in-all it wasn’t quite what we were looking for.

We also considered Matrix Jobs, but a Matrix Job is designed to execute several jobs with parameter(s) varied along configuration vector(s), i.e. build node, operating system, browser, arbitrary parameter, etc. This was not a fit for the purpose because our jobs had wildly different configurations that could not be coerced into mere parameter differences.

What we needed was a way to create a Jenkins job type that would execute a selection of arbitrary Jenkins jobs, wait for the jobs to finish, and report a single result while still making it possible to drill down to sub-jobs to determine the sources of failures.

So we wrote a Jenkins plugin to achieve this, the Jenkins Master Project Plugin.

Now our Jenkins dashboard represents the deployment pipeline:

When a stage turns red (or yellow), you can click through to that particular Master Build, see what tests failed and drill through the results (or even rebuild).

We also wrote a Triggering User Plugin for determining the user who triggered the build and a Deployinator Plugin to link key Deployinator information to particular Jenkins build.

The Triggering User and Master Project plugins are both integral to our latest version of Try.

We have also made our Nagios plugin for Jenkins readily available on the Etsy GitHub account. We used this for experimenting with alerting on the health of Jenkins.

All of these plugins are freely available on GitHub under the Etsy organization. Enjoy!


Related Posts

9 Comments

One of the links in your Jenkins screen grab says “php code duplication” – is this something you can link to/share/write about?

Love your transparency, thank you for sharing!

Reblogged this on tekArtist.

Very interesting idea and plugin to divide your projects into logical jobs – Your master jobs are deployment to prod, qa, princess for each of your tabs/projects – web, search, mobile e.t.c. Each master jobs consist of several smaller jobs. Very cool, thank for sharing on github, I will give it a try for my build jobs.

Do you have plans to put this in the jenkins plugin browser? I’m pretty clueless about java and can’t seem to do the right things to get the environment set up

I vaguely remember seeing something similar in one of the enterprise front-end additions for Hudson. It’s a great idea–a ton of people probably still don’t yet realize how much they want this.

[…] Scaling CI at Etsy: Divide and Concur, Revisited is the lightsaber heuristic in action — only around Jenkins plugins this time. […]

Hi,

could you contribute this plugin to the jenkins community ? This won’t have impact on your development process, just help distribute this plugin as part of the official jenkins update center. Just let me know, I’d be happy to help you migrating to jenkins community infrastructure.