The Art of the Dojo

Posted by on February 17, 2015 / 3 Comments

According to the Wikipedia page I read earlier while waiting in line for a latte, dojo literally means place of the way in Japanese, but in a modern context it’s the gathering spot for students of martial arts. At Etsy and elsewhere, dojos refer to collaborative group coding experiences.

 “Collaborative group coding experience” is a fancy way of saying “we all take turns writing code together” in service of solving a problem. At the beginning of the dojo we set an achievable goal, usually unrelated to our work at Etsy (but often related to a useful skill), and we try to solve that goal in about two hours.

 What that means is a group of us (usually around five at any one time) goes into a conference room with a TV on the wall, plugs one of our laptops into the TV, and each of us takes turns writing code to solve a problem. We literally take turns sitting at the keyboard writing code. We keep a strict three minute timer at the keyboard; after three minutes are up, the person at the keyboard has to get up and let another person use the keyboard. We pick an editor at least one person knows and stick with it–invariably someone will use an editor that isn’t their first choice and that’s fine.

 I often end up organizing the dojos and I’m a sucker for video games, so I usually say, “Hey y’all, let’s make a simple video game using JavaScript and HTML5’s Canvas functionality,” and people say, “kay.” HTML5 games work very well in a short dojo environment because there is instantaneous feedback (good for seeing change happen in a three minute period), there are a variety of challenges to tackle (good for when there are multiple skillsets present), the games we decide to make usually have very simple game mechanics (good for completing the task within a couple of hours), and there is an enormous amount of reward in playing the game you just built from scratch. Some examples of games we’ve successfully done include Pong, Flappy Bird, and Snake.

 That’s it. Easy peasy lemon squeezy. Belying this simplicity is an enormous amount of value to be derived from the dojo experience. Some of the potential applications and benefits of dojos:

 Onboarding. Hook a couple of experienced engineers up with a batch of new hires and watch the value flow. The newbies get immediate facetime with institutional members, they get to see how coding is done in the organization, and they get a chance to cut loose with the cohort they got hired with. The veterans get to meet the newest members of the engineering team, they get to impart some good practices, and they get to share knowledge and teach. These kinds of experiences have a tendency to form strong bonds that stick with people for their entire time at the company. It’s like a value conduit. Plus it’s fun.

 In-House Networking. Most of Etsy’s engineers, PMs, and designers work on teams divided along product or infrastructural lines. This is awesome for incremental improvements to the product and for accumulating domain knowledge, but not so great when you need to branch out to solve problems that exist outside your team. Dojos put engineers, PMs or designers from different teams (or different organizations) together and have them solve a problem together. Having a dojo with people you don’t normally interact with also gives you points of contact all over the company. It’s hard to overstate the value of this–knowing I can go talk to people I’ve dojo-ed with about a problem I’m not sure how to solve makes the anxiety about approaching someone for help evaporate.

 Knowledge Transfer. Sharing knowledge and techniques gained through experience (particularly encoded knowledge) is invaluable within an organization. Being on the receiving end of knowledge transfer is kind of like getting power leveled in your career. Some of my bread and butter skills I use every day I learned from watching other people do them and wouldn’t have figured them out on my own for a long, long time. The most exciting part of a dojo is when someone watching the coder shouts “whoa! waitwaitwaitwait, how did you do that!? Show me!”

 Practice Communicating. Dojos allow for tons of practice at the hardest part of the job: communicating effectively. Having a brilliant idea is useless if you can’t communicate it to anyone. Dojo-ing helps hone communication skills because for the majority of the dojo, you’re not on the keyboard. If you want to explain an idea or help someone who’s stuck on the keyboard, you have to be able to communicate with them.

 Training. I work with a lot of really talented designers who are always looking to improve their skills on the front end (funny how talent and drive to improve seem to correlate). Rather than me sitting over their shoulder telling them what to do, or worse, forcing them to watch me do some coding, a couple of engineers can dojo up with a couple of designers and share techniques and knowledge. This is also applicable across engineering disciplines. We’re currently exploring dojos as a way for us web folks to broaden our iOS and Android skills.

 The Sheer Fun of it All. I’m a professional software engineer in the sense that I get paid to do what I love, and I take good engineering seriously within the context of engineering and the business it serves. Thinking hard about whether I should name this variable in the past tense or present tense, while important, is exhausting. Kicking back for a couple of hours (often with a beer in hand–but there’s no pressure to drink either) and just hacking together a solution with duct tape and staples, knowing I’m not going to look at it after the dojo is a welcomed break. It reminds me of the chaos and fun of when I first learned to program. It also reinforces the idea that the dojo is less about the code and more about the experience. We play the games after we finish them, sometimes as a competition. Playing the game you just built with the people you built it with is a rewarding and satisfying experience, and also serves to gives a sense of purpose and cohesion to the whole experience.

 Our process evolved organically. Our first dojo was a small team of us solving an interview question we asked candidates. We realized that in addition to being helpful, the dojo was also really fun. So we went from there. We moved on to typical coding katas before we finally hit our stride on video games. I encourage anyone starting out with dojos to iterate and find a topic or style that works for you. Some other considerations when running a dojo: there will likely be different skill levels involved. Be sure to encourage people who want contribute – designers, PMs, or anyone in between. It’s scary to put yourself out there in front of people whose opinions you respect; a little bit of reassurance will go a long way towards making people comfortable. Negative feedback and cutting down should be actively discouraged as they do nothing in a dojo but make people feel alienated, unwelcome and stupid. Be sure to have a whiteboard; dojos are a great reminder that a lot of the actual problem solving of software engineering happens away from the keyboard. Make sure you have a timer that everyone can see, hear, and use (if it’s a phone, make sure it has enough charge; the screen usually stays on for the whole dojo which drains the battery quick-like). Apply your typical learning to your dojos and see if you can’t find something that works for you. Most importantly, have fun.



Sahale: Visualizing Cascading Workflows at Etsy

Posted by on February 11, 2015 / 9 Comments

The Problem

If you know anything about Etsy engineering culture, you know we like to measure things. Frequent readers of Code As Craft have become familiar with Etsy mantras like “If it moves, graph it.”

When I arrived at Etsy, our Hadoop infrastructure, like most systems at Etsy, was well instrumented at the operational level via Ganglia and the standard suite of exported Hadoop metrics, but was still quite opaque to our end users, the data analysts.

Our visibility problem had to do with a tooling decision made by most Hadoop shops: to abandon tedious and counterintuitive “raw MapReduce” coding in favor of a modern Domain Specific Language like Pig, Hive, or Scalding.

At Etsy, DSL programming on Hadoop has been a huge win. A DSL enables developers to construct complex Hadoop workflows in a fraction of the time it takes to write and schedule a DAG of individual MapReduce jobs. DSLs abstract away the complexities of MapReduce and allow new developers to ramp up on Hadoop quickly.

Etsy’s DSL of choice, Scalding, is a Scala-based wrapper for the Cascading framework. It features a vibrant community, an expressive syntax, and a growing number of large companies like Twitter scaling it out in production.

However, DSLs on Hadoop share a couple of drawbacks that must be addressed with additional tooling. The one that concerned users at Etsy the most was a lack of runtime visibility into the composition and behavior of their Scalding jobs.

The core of the issue is this: Hadoop only understands submitted work at the granularity of individual MapReduce jobs. The JobTracker (or ResourceManager on Hadoop2) is blissfully unaware of the relationships among tasks executed on the cluster, and its web interface reflects this ignorance:

Figure 1: YARN Resource Manager. It’s 4am - do you know where your workflow is?

Figure 1: YARN Resource Manager. It’s 4am – do you know where your workflow is?

A bit dense, isn’t it? This is fine if you are running one raw MapReduce job at a time, but the Cascading framework compiles each workflow down into a set of interdependent MapReduce jobs. These jobs are submitted to the Hadoop cluster asynchronously, in parallel, and ordered only by data dependencies between them.

The result: Etsy users tracking a Cascading workflow via the JobTracker/ResourceManager interface did not have a clear idea how their source code became a Cascading workflow plan, how their Cascading workflows mapped to jobs submitted to Hadoop, which parts of the workflow were currently executing, or how to determine what happened when things went badly.

This situation also meant any attempt to educate users about optimizing workflows, best practices, or MapReduce itself was relegated to a whiteboard, Confluence doc, or (worse) an IRC channel, soon to be forgotten.

What Etsy needed was a user-facing tool for tracking cluster activity at the granularity of Cascading workflows, not MapReduce jobs – a tool that would make core concepts tangible and allow for proactive, independent assessment of workflow behavior at runtime.

Etsy’s Data Platform team decided to develop some internal tooling to solve these problems. The guidelines we used were:

Fast forward to now: I am pleased to announce Etsy’s open-source release of Sahale, a tool for visualizing Cascading workflows, to you, the Hadooping public!

Design Overview

Let’s take a look at how it works under the hood. The design we arrived at involves several discrete components which are deployed separately, as illustrated below:

Figure 2: Design overview.

Figure 2: Design overview.

The components of the tool are: 

The workflow is as follows:

  1. A User launches a Cascading workflow, and a FlowTracker is instantiated.
  2. The FlowTracker begins to dispatch periodic job metric updates to Sahale.
  3. Sahale commits incoming metrics to the database tables.
  4. The user points her browser to the Sahale web app.

About FlowTracker 

That’s all well and good, but how does the FlowTracker capture workflow metrics at runtime?

First, some background. In the Cascading framework, each Cascade is a collection of one or more Flows. Each Flow is compiled by Cascading into one or more MapReduce jobs, which the user’s client process submits to the Hadoop cluster. With a reference to the running Flow, we can track various metrics about the Hadoop jobs the Cascading workflow submits.

Sahale attaches a FlowTracker instance to each Flow in the Cascade. The FlowTracker can capture its reference to the Flow in a variety of ways depending on your preferred Cascading DSL.

By way of an example, let’s take a look at Sahale’s TrackedJob class. Users of the Scalding DSL need only inherit TrackedJob to visualize their own Scalding workflows with Sahale. In the TrackedJob class, Scalding’s method is overridden to capture a reference to the Flow at runtime, and a FlowTracker is launched in a background thread to handle the tracking:

 * Your jobs should inherit from this class to inject job tracking functionality.
class TrackedJob(args: Args) extends com.twitter.scalding.Job(args) {
  @transient private val done = new AtomicBoolean(false)

override def run(implicit mode: Mode) = {
    mode match {
      // only track Hadoop cluster jobs marked "--track-job"
      case Hdfs(_, _) => if (args.boolean("track-job")) runTrackedJob else
      case _ =>

  def runTrackedJob(implicit mode: Mode) = {
    try {
      val flow = buildFlow
      flow.getFlowStats.isSuccessful // return Boolean
    } catch {
      case t: Throwable => throw t
    } finally {
      // ensure all threads are cleaned up before we propagate exceptions or complete the run.

  private def trackThisFlow(f: Flow[_]): Unit = { (new Thread(new FlowTracker(f, done))).start }

The TrackedJob class is specific to the Scalding DSL, but the pattern is easy to extend. We have used the tool internally to track Cascading jobs generated by several different DSLs.

Sahale selects a mix of aggregated and fine-grained metrics from the Cascading Flow and the underlying Hadoop MapReduce jobs it manages. The FlowTracker also takes advantage of the Cascading client’s caching of recently-polled job data whenever possible.

This approach is simple, and has avoided incurring prohibitive data transfer costs or latency, even when many tracked jobs are executing at once. This has proven critical at Etsy where it is common to run many 10’s or 100’s of jobs simultaneously on the same Hadoop cluster. The data model is also easily extendable if additional metrics are desired. Support for a larger range of default Hadoop counters is forthcoming.

About Sahale

Figure 3: Workflow overview page.

Figure 3: Workflow overview page.

The Sahale web app’s home page consists of two tables. The first enumerates running Cascading workflows. The second displays all recently completed workflows. Each table entry includes a workflow’s name, the submitting user, start time, job duration, the number of MapReduce jobs in the workflow, workflow status, and workflow progress.

The navigation bar at the top of the page provides access to the help page to orient new users, and a search feature to expose the execution history for any matching workflows.

Each named workflow links to a detail page that exposes a richer set of metrics:

Figure 4: Workflow details page, observing a job in progress.

Figure 4: Workflow details page, observing a job in progress.

The detail page exposes the workflow structure and job metrics for a single run of a single Cascading workflow. As before, the navigation bar provides links to the help page, back to the overview page, and to a history view for all recent runs of the workflow.

 The detail page consists of five panels:

Users can leverage these displays to access a lot of actionable information as a workflow progresses, empowering them to quickly answer questions like:

In the event of a workflow failure, even novice users can identify the offending job stage or stages as easily as locating the red nodes in the graph view. Selecting failed nodes provides easy access to the relevant Hadoop job logs, and the Sources/SInks tab makes it easy to map a single failed Hadoop job back to the user’s source code. Before Sahale this was a frequent pain point for Hadoop users at Etsy.

Figure 5: A failed workflow. The offending MapReduce job is easy to identify and drill down into.

Figure 5: A failed workflow. The offending MapReduce job is easy to identify and drill down into.

If a user notices a performance regression during iterative workflow development, the Job History link in the navigation bar will expose a comparative view of all recent runs of the same workflow:

Figure 6: History page, including aggregated per-workflow metrics for easy comparisons.

Figure 6: History page, including aggregated per-workflow metrics for easy comparisons.

Here, users can view a set of bar graphs comparing various aggregated metrics from historical runs of the same workflow over time. As in the workflow graphs, color is used to indicate the final status of each charted run or the magnitude of the I/O involved with a particular run.

 Hovering on any bar in a chart displays a popup with additional information. Clicking a bar takes users to the detail page for that run, where they can drill down into the fine-grained metrics. The historical view makes mapping changes in source code back to changes in workflow performance a cinch.

Sahale at Etsy

After running Sahale at Etsy for a year, we’ve seen some exciting changes in the way our users interact with our BigData stack. One of the most gratifying for me is the way new users can ramp up quickly and gain confidence self-assessing their workflow’s performance characteristics and storage impact on the cluster. Here’s one typical example timeline, with workflow and user name redacted to protect the excellent: 

Around January 4th, one of our new users got a workflow up and running. By January 16th, this user had simplified the work needed to arrive at the desired result, cutting the size of the workflow nearly in half. By removing an unneeded aggregation step, the user further optimized the workflow down to a tight 3 stages by Feb. 9th. All of this occurred without extensive code reviews or expert intervention, just some Q&A in our internal IRC channel.

Viewing the graph visualizations for this workflow illustrates its evolution across the timeline much better:


Figure 7a: Before (Jan 4th)


Figure 7b: During (Jan. 16th)

After (Feb. 9th)

Figure 7c: After (Feb. 9th)

It’s one thing for experienced analysts to recommend that new users try to filter unneeded data as early in the workflow as possible, or to minimize the number of aggregation steps in a workflow. It’s another thing for our users to intuitively reason about these best practices themselves. I believe Sahale has been a great resource for bridging that gap.


Sahale, for me, represents one of many efforts in our company-wide initiative to democratize Etsy’s data. Visualizing workflows at runtime has enabled our developers to iterate faster and with greater confidence in the quality of their results. It has also reduced the time my team spends determining if a workflow is production-ready before deployment. By open sourcing the tool, my hope is that Sahale can offer the same benefits to the rest of the Big Data community.

Future plans include richer workflow visualizations, more charts, additional metrics/Hadoop counter collection on the client side, more advanced mappings from workflow to users’ source code (for users of Cascading 2.6+), and out-of-the-box support for more Cascading DSLs.

Thanks for reading! As a parting gift, here’s a screenshot of one of our largest workflows in progress:

Figure 8: Etsy runs more Hadoop jobs by 7am than most companies do all day.

Figure 8: Etsy runs more Hadoop jobs by 7am than most companies do all day.



Q4 2014 Site performance report

Posted by and on February 9, 2015 / 2 Comments

Happy 2015!  We’re back with the site performance report for the fourth quarter of 2014.

We’re doing this report a bit differently than we have in the past. In our past reports, we compared metrics collected on one Wednesday in the current quarter to metrics collected on one Wednesday in the previous quarter. In an effort to get a more normalized look at our performance, for this report we’ve taken data across an entire week in December and are comparing it with data from an entire week in October. We plan to continue to explore new methods for discussing our site’s performance over time in future reports, so stay tuned!

In the meantime, let’s see how the numbers look for this quarter. Like last time, different members of the Performance team will write different sections – Natalya will discuss our server side performance, I will cover our synthetic front-end monitoring, and Jonathan will tell us about our real user front-end monitoring.

Server Side Performance – from Natalya Hoota

“Server side” means literally that, time that it takes for a request to execute on the server. We obtained this data by querying our web logs for the requests made on the five popular pages and our baseline page (barebone version of a page with no content except for a header and footer). The median and 95% numbers of our server side performance are presented below.

Q4 2014 ServerSide

We did not see any significant trend in the data; both medians and 95% results were relatively flat, with delta less than 5% of a total metric.

What we came to realize, however, is the need to reevaluate both our statistical validity of our data and our measurement approach to make sure we can filter the signal from the noise. We are planning to do so in the Q1 2015 report. There are a few open questions here, for example, what is considered a significant difference, how to deal with seasonality, or what is a better way to represent our findings.

Synthetic Front-end Performance – from Allison McKnight

Here is the synthetic front-end performance data for Q4.  This data is collected by Catchpoint running tests on IE9 in New York, London, Chicago, Seattle, and Miami every two hours.  Catchpoint’s webpage response metric measures the time from the request being issued until the last byte of the final element of the page is received.
Q4 2014 RUM

Most of the changes in the median value were attributed to noise. Catchpoint gives us a standard deviation, which we used to calculate error ranges for our median values. The only statistically valid differences were improvements in the homepage web response time and start render time for listing and search pages.

These improvements can be attributed in part to the removal of some resources – serving fewer images on the homepage and the listing page led to a decrease in webpage response time, and two webfonts were removed from the search page.

Real User Monitoring – from Jonathan Klein

Here is the RUM (Real User Monitoring) data collected by mPulse. We measure median and 95th percentile page load time (literally the “Page Load” timer in mPulse).
Q4 2014 Synthetic

The news here is good, albeit unsurprising. The medians effectively haven’t changed since Q3, and the 95th percentiles are down noticeably. The steady medians match what we saw on the synthetic side, which is great to see.

As we mentioned in our last report, we had an issue at one of our CDNs where some CSS files were not being gzipped. This issue was resolved before we pulled this data, and our assumption is that this brought the 95th percentile down. One obvious question is why the homepage had such a large decrease relative to the other pages. We believe this is due to product level changes on our signed-in homepage, but as we mentioned last time we are currently unable to filter signed-in and signed-out users in mPulse. We’ve been told that a feature is on the way to enable that segmentation, and we’ll start differentiating in this report when that happens.

Conclusion- from Natalya Hoota

The last quarter of the 2014 was a relatively quiet time in terms of infrastructure changes, and it showed in the performance metrics. We have seen a slight improvement throughout the site as a result of an experiment with including fewer images and fonts. Fixing a CSS compression bug at one of our CDNs helped our user experience as well.

We would like to find a simple and comprehensive way to represent both the performance trends for the quarter’s duration and give a snapshot of how we are doing at the end of the quarter. Using averages loses details of our users’ experience on the site, and using the daily (and, perhaps, weekly) median taken at the end of the quarter is not taking into account effects that might have occurred during the other parts of the quarter. In the reports to come, we will focus on exploring the methodology of our data collection and interpretation.


Rebuilding the Foundation of Etsy’s Seller Tools

At its core, the Etsy marketplace is powered by its sellers. The Shop Management team at Etsy is comprised of engineers, designers, product managers, usability researchers, and data analysts who work together on the tools that sellers use every day to manage their shops.  These tools are a critical part of how the members of our community run their businesses, and as a result, are a critical part of Etsy’s business as well.

At the start of 2014, we were at an impasse. After years of iteration by many different teams, our seller tools were starting to show their age. New tools that had been added over time were often disconnected from core user workflows. This made new features hard to discover and often cumbersome to integrate into a seller’s daily activities. Design patterns had evolved, making the tools inconsistent and unintuitive, even to experienced shop owners. And despite increasing mobile traffic, few of our seller tools were optimized for mobile web.

From a technical perspective, some of our most important pages had also grown to be our slowest. Much of the business logic of our code was built into continually expanding page-specific web controllers, making it difficult to build and share functionality between the web stack and our native apps.

All of this led to a gradual degradation of our user experience. It was difficult for us to make tools that were intuitive and fit our sellers’ workflows, and slowed our ability to rapidly develop and execute on new ideas.

We decided to re-evaluate how our seller tools are built. One year later, we are extremely excited to announce the release of our new Listings Manager. The task wasn’t simple and it required our team to collaborate on a whole new level to pull it off. Here’s a brief overview of how it all came together.


Rewrites are difficult

Anyone who has been in the software industry for a substantial amount of time knows that full-scale rewrites are generally a bad idea. In a famous article by Joel Spolsky, he referred to them as “the single worst strategic mistake that any software company can make.”

And with good reason! Rewrites take much longer than anyone estimates, they can cause new feature development to grind to a halt while chasing parity with your existing project, and, worst of all, you are trading your existing, battle-tested production code for brand new code with its own new bugs and overlooked nuances.

Aside from the technical challenges, rewritten products are often built in isolation from their users. Without regular feedback from real users to keep you on track, you can reach the end of the project and discover that you’ve drifted into building the wrong product — something that makes sense to you, but which fails to actually address the day-to-day problems faced by your users.

Faced with the challenge of solving some massive and fundamental problems with our seller tools, we needed to craft an approach that kept our rewrite technically sustainable and focused on the needs of our seller community. We built a plan that centered around a few major tenets:

Rethinking our CSS

Once the project kicked off, we wanted to get feedback as soon as possible. We started by creating static mockups and put them in front of sellers during remote research sessions. As the form of what we were going to build began to solidify, we found ourselves wanting to prototype features and rapidly iterate on them for the next round of testing. However, creating HTML prototypes was going to be cumbersome with the way we had previously been writing CSS.

In the past, building new features had always involved writing a fair amount of CSS, little of which was reusable. Our CSS was written in terms of pages and features instead of built around much broader patterns. The design team had an existing style guide that documented basic patterns, but the patterns were beginning to show their age; they weren’t responsive and reflected outdated visual styles. A redesign of the seller tools provided us with the perfect opportunity to experiment with mobile-optimized, patternized ways of constructing CSS.


Inspired by frameworks like Bootstrap and methodologies like object-oriented CSS, we built a brand new style guide that powered the development of the new seller tools. This style guide was focused around two patterns: component classes and utility classes. Components are abstracted visual patterns reused in many places, like the grid system or a listing card:

// Base alert styles
.alert {
    padding: 20px;

// Alert layouts
.alert--inline {
    display: inline-block;
.alert--fixed-top {
    position: fixed;
    top: 0;

// Alert themes
.alert--red {
    background-color: $red;
.alert--green {
    background-color: $green;
<div class="alert alert--inline alert--green">
    Your listing has been updated!

Utilities change visual details in one-off situations, like adding margin or making text bold:

.text-gray-lighter {
    color: $gray-lighter;
.text-smaller {
    font-size: 12px;
.strong {
    font-weight: bold;
<p class="strong">
    Listings Manager
<p class="text-smaller text-gray-lighter">
    Manage the listings in your Etsy shop

Everything was written to be responsive and independent of its surroundings. We wanted to be able to quickly build fully responsive pages using components and utilities while using little to no page-specific CSS.

With the style guide in place, spinning up a static HTML prototype for a usability testing session could be done in under an hour. Building multiple versions of a page was no sweat. When user reactions suggested a change in design direction, it was cheap to rip out old UI and build something new in its place, or even iterate on the style guide itself. And when we had settled on a direction after a round of research, we could take the prototype and start hooking it up in the product immediately with little design time needed.

A very intentional side effect of creating building blocks of CSS was that it was easy for engineers to style their work without depending on a designer. We invested a lot of time up front in documenting how to use the style guide so there was a very low barrier to entry for creating a clean, consistent layout. The style guide made it easy to build layouts and style design elements like typography and forms, allowing designers more time to focus on some of the more complex pieces of UI.

Architecting a rich web application

Early on in the design and usability testing process, we saw that part of what didn’t work with the current app was the isolation of features into standalone pages rather than workflows. We began thinking about highly interactive modules that could be used wherever it made sense in a flow. The idea of a single page app began to take hold. And when we thought about the shared API framework and infrastructure we were already building to support mobile apps, it became easy to think of our JavaScript client as just another consumer.

A thick-client JavaScript page was new territory for us. Over the course of the project, we had to address numerous challenges as we tried to integrate this new approach into our current architecture.

One of the most daunting parts of a rewrite is working with unknown techniques and technologies. We evaluated some frameworks and libraries and eventually found ourselves leaning toward Backbone. At Etsy, we value using mature, well-understood technologies, and Backbone was already our go-to solution for adding more structure to the client-side code. We also really liked the simplicity and readability of the library, and we could leverage the existing Backbone code that we had already written across the site.

As we began talking about the app architecture we found ourselves looking for a little more structure, so we began to look at some layers on top of Backbone. We arrived at Marionette, a library whose classes fit the vocabulary we were already using to describe parts of the app: Regions, Lists, Layouts, and Modules. In digging deeper, we found an easy-to-read codebase and a great community. Shortly after we started to build our new platform with Marionette, one of our engineers became a core contributor to the framework. Marionette struck a great balance between providing structure and not being too opinionated. For example, using Behaviors (a way of building reusable interactions) a flexible modal pattern and a particular instance might be defined as:

var OverlayBehavior = Backbone.Marionette.Behavior.extend({
    ui: {
        close: '[data-close]'

    events: {
        'click @ui.close': 'onClickClose',

    onClickClose: function() {
        Radio.commands.execute('app', 'mask:close');

var RenewOverlay = Backbone.Marionette.ItemView.extend({
    behaviors: {
        Overlay: {
            behaviorClass: OverlayBehavior

And launched using Marionette’s command method:

Radio.commands.execute('app', 'overlay:show', new RenewOverlay());

The app will respond by passing this newly constructed view into an overlay region and taking care of the mask and animations.

With these decisions in place, we began prototyping for our user research while fleshing out some of the open questions we had about our application.

Shipping data to the client efficiently

Our old seller tools were some of the slowest parts of the site, making performance one of our primary concerns. Part of what made these pages slow was the amount of data we needed to access from across our cluster of database shards. Fortunately, at this time the API team was hard at work building a new internal API framework to power fast, custom views to all of our clients by concurrently fanning out from a single endpoint to many others:

class Api_BespokeAjax_Shop_Listings_Form {
     public function handle($client, $input, $response = null) {
         $listing = $client->shop->listingsGet($input->listing_id);

        $proxies = [
            'listing' => $listing,
            'shipping' => $client->shop->listingsGetShipping($input->listing_id),
            'processing' => $client->shop->listingsGetProcessing($input->listing_id),
            'variations' => $client->shop->listingsGetVariations($input->listing_id),

        $response_data = Callback_Orchestrator::runAndExtract($proxies, __CLASS__);
        Api_Response::httpCode($response, 200);
        return $response_data;

In the above example, the orchestrator resolves the results of the fanout, provides us with a single, composed result. Wrapping existing endpoints meant we didn’t have to rewrite chunks of functionality. The new API endpoints and the style guide gave us the flexibility to put live, high-fidelity working prototypes in front of users with about the same ease as faking it.

So what’s next?

Now that we’ve gone through building a major product on this new platform, we’re already setting our sights on how to improve these foundational components, and evaluating other projects that might make sense to approach in a similar fashion in the future.

Seeing how other features at Etsy can start leveraging some of these innovations today is even more exciting; whether it’s building out features on our new API to share functionality between the web and our apps, or using the client-side architecture and style guide to quickly build out more rich and interactive experiences on Etsy.

With today’s launch of the new Listings Managers, we’ve reached a milestone with this new tooling, but there is still much to do. As we keep iterating on these components and architectures and apply them to new and different products, we’ll no doubt find new problems to solve. We’ll let you know what we find.


This article was written as a collaboration between Diana Mounter, Jason Huff, Jessica Harllee, Justin Donato, and Russ Posluszny.


Transitioning to SCSS at Scale

Posted by on February 2, 2015 / 18 Comments

Naively, CSS appears easy to comprehend — it doesn’t have many programming constructs, and it’s a declarative syntax that describes the appearance of the DOM rather than an executable language. Ironically it’s this lack of functionality that can make CSS difficult to reason about. The inability to add scripting around where and when selectors are executed can make wide-reaching changes to CSS risky.

CSS preprocessors introduce advanced features to CSS that the current iteration of the CSS specification does not.  This functionality commonly includes variables, functions, mixins and execution scope, meaning that developers can embed logic that determines how CSS is written and executed.  If correctly applied preprocessors can go a long way towards making CSS more modular and DRY, which in turn result in long-term maintainability wins for a codebase.

One of the goals of the Front-end Infrastructure Team for 2014 was to fully transition the CSS codebase at Etsy to SCSS [1]. SCSS is a mature, versatile CSS preprocessor, and Etsy’s designers and developers decided to integrate it into our tech stack.  However, we knew that this effort would be non-trivial with a codebase of our size.  As of October 2014, we had 400,000+ lines of CSS split over 2000+ files.

In tandem with a team of designers, the Front-end Infrastructure Team began developing the processes to deploy SCSS support to all development environments and our build pipeline. In this post I’ll cover the logic behind our decisions, the potential pitfalls of a one-time CSS-to-SCSS conversion, and how we set up tooling to optimize for maintainability moving forward.


The biggest validation of the potential for SCSS at Etsy was the small team of designers beta-testing it for more than six months before our work began.  Since designers at Etsy actively push code, a single product team led the initial charge to integrate SCSS into their workflow.  They met regularly to discuss what was and was not working for them and began codifying their work into a set of best practices for their own project.

It was through the initial work of this product team that the rest of the company began to see the value and viability of introducing a CSS preprocessor.  The input of these designers proved invaluable when the Front-end Infrastructure Team began meeting to hatch a plan for the deployment of SCSS company-wide, and their list of best practices evolved into an SCSS style guide for future front-end development.

After evaluating the landscape of CSS preprocessors we decided to move forward with SCSS. SCSS is an extremely popular project with an active developer community, it is feature rich with excellent documentation, and because the SCSS (Sassy CSS) syntax is a superset of CSS3, developers wouldn’t have to learn a new syntax to start using SCSS immediately. With regards to performance, the Sass team prioritizes the development and feature parity of libsass, a C/C++ port of the Sass engine [2]. We assumed that using libsass via the NodeJS bindings provided by node-sass would enable us to integrate an SCSS compilation step into our builds without sacrificing speed or build times.

We were also excited about software released by the larger SCSS community, particularly tools like scss-lint. In order to compile SCSS we knew that a conversion of CSS files to SCSS meant remedying any syntactical bugs within our CSS code base.  Since our existing CSS did not have consistently-applied coding conventions, we took the conversion as an opportunity to create a consistent, enforceable style across our existing CSS.  Coupling this remediation with a well-defined style guide and a robust lint, we could implement tooling to keep our SCSS clean, performant and maintainable moving forward.

An Old and New CSS Pipeline

Our asset build pipeline is called “builda” (short for “build assets”). It was previously a set of PHP scripts that handled all JS and CSS concatenation/minification/versioning. When using libraries written in other languages (e.g. minification utilities), builda would shell out to those services from PHP. On developer virtual machines (VMs) builda would build CSS dynamically per request, while it would write concatenated, minified and versioned CSS files to disk in production builds.

We replaced the CSS component of builda with an SCSS pipeline written in NodeJS. We chose Node for three reasons. First, we had already re-written the JavaScript build component in Node a year ago, so the tooling and strategies for deploying another Node service internally were familiar to us. Second, we’ve found that writing front-end tools in JavaScript opens the door for collaboration and pull requests from developers throughout the organization. Finally, a survey of the front-end software ecosystem reveals a strong preference towards JavaScript, so writing our build tools in JS would allow us to keep a consistent codebase when integrating third-party code.

One of our biggest worries in the planning stages was speed. SCSS would add another compilation step to an already extensive build process, and we weren’t willing to lengthen development workflows or build times. Fortunately, we found that by using libsass we could achieve a minimum of 10x faster compilation speeds over the canonical Ruby gem.

We were dedicated to ensuring that the SCSS builda service was a seamless upgrade from the old one. We envisioned writing SCSS in your favorite text editor, refreshing the browser, and having the CSS render automatically from a service already running on your VM — just like the previous CSS workflow. In production, the build pipeline would still output properly compiled, minified and versioned CSS to our web servers.

Despite a complete rewrite of the CSS service, with a robust conversion process and frequent sanity checking, we were able to replace CSS with SCSS and avoid any disruptions. Workflows were identical to before the rewrite and developers began writing SCSS from day one.

Converting Legacy Code

In theory, converting CSS to SCSS is as simple as changing the file extension from .css to .scss.  In practice it’s much more complicated.

Here’s what’s hard about CSS: It fails quietly.  If selectors are malformed or parameters are written incorrectly (i.e. #0000000 instead of #000000), the browser simply ignores the rule. These errors were a blocker on our conversion because when SCSS is compiled, syntax errors will prevent the file from compiling entirely.

But errors were only one part of the problem. What about intentionally malformed selectors in the form of IE-hacks? Or, what about making changes to legacy CSS in order to conform to new lint rules that we’d impose on our SCSS? For example, we wanted to replace every instance of a CSS color-keyword with its hex value.

Our conversion was going to touch a lot of code in a lot of places. Would we break our site by fixing our CSS?  How could we be confident that our changes wouldn’t cause visual regressions?

Conventionally there are some patterns to solve this problem. A smaller site might remedy the syntax bugs, iterate every page with a headless browser and create visual diffs for changes. Alternatively, given a certain size it might even be possible to manually regression test each page to make sure the fixes render smoothly.

Unfortunately our scale and continuous experimentation infrastructure makes both options impossible, as there are simply far too many different combinations of pages/experiments to test against, all subject to change at a moments notice.  A back of the envelope calculation puts the number of possible variants of at any time at ~1.2M.

We needed to clean any incorrect CSS and enforce new lint rules before we performed the SCSS rename, and we needed to confirm that those fixes wouldn’t visually break our site without the option to look at every page. We broke the solution into two distinct steps: the “SCSS Clean” and the “SCSS Diff.”

SCSS Clean

We evaluated various ways to perform the CSS fixes, initially involving an extensive list of regular expressions to transform incorrect patterns we identified in the code. But that method quickly became untenable as our list of regular expressions was difficult to reason about.

Eventually we settled on our final method: using parsers to convert any existing source CSS/SCSS code into Abstract Syntax Trees (AST), which we could then manipulate to transform specific types of nodes.  For the unfamiliar, an AST is a representation of the structure of parsed source code.  We used the Reworkcss CSS parser to generate CSS ASTs and gonzales-pe to generate SCSS ASTs, and wrote a custom adapter between the two formats to streamline our style and syntax changes.  For an example into what a generated AST might look like, here’s a great example from the Reworkcss CSS parser.

By parsing our existing CSS/SCSS into ASTs, we could correct errors at a much more granular level by targeting selectors or errors of specific types. Going back to the color-keyword example, this gave us a cleaner way to replace properties that specified color values as color-keywords (“black”) with their equivalent hexadecimal representation (#000000).  By using an AST we could perform the replacement without running the risk of replacing color words in unintended locations (e.g. selectors: “.black-header”) or navigating a jungle of regular expressions.

In summary, our cleaning process was:

  1. Generate an AST for the existing CSS/SCSS file.
  2. Run a script we created to operate over the AST to identify and fix errors/discrepancies on a per-property level.
  3. Save the output as .scss.
  4. Run the .scss file through the libsass compiler until the successful compilation of all files.
  5. Iterate on steps #2-4, including manual remediation efforts on specific files as necessary.


Cleaning our CSS was only half the battle. We also needed a way to confirm that our cleaned CSS wouldn’t break our site in unexpected ways, and to make that determination automatically across thousands of files.

Again we turned to ASTs.  ASTs strip away superficial differences in source code to core language constructs.  Thus we could conclude that if two ASTs were deeply equivalent, regardless of superficial differences in their source, they would result in the same rendered CSS.

We used our Continuous Integration (Jenkins) server to execute the following process and alert us after each push to production:

  1. Run the old builda process with the original, untouched CSS, resulting in the minified, concatenated and versioned CSS that gets deployed to production servers and the live site.  Build an AST from this output.
  2. Concurrently to step 1, run the SCSS conversion/clean, generating SCSS files from CSS.  Run these SCSS files through SCSS builda, resulting in minified, concatenated, and versioned CSS from which we could generate an AST.
  3. Diff the ASTs from steps 1 and 2.
  4. Display and examine the diff.  Iterate on steps 1-3, modifying the cleaning script, the SCSS builda code or manually addressing issues in CSS source until the ASTs are equivalent.


With equivalent ASTs we gained confidence that despite touching the thousands of CSS files across, the site would look exactly the same before and after our SCSS conversion.  Integrating the process into CI gave us a quick and safe way to surface the limits of our cleaning/SCSS implementations by using live code but not impacting production.

Going Live

With sufficient confidence via AST diffing, our next step was to determine how to deploy to production safely. Here was our deployment strategy:


Using the original CSS as source, we added the SCSS builda process to our deployment pipeline. On a production push it would take the original CSS, clean it, create SCSS files and then compile them to CSS files in a separate directory on our production servers. We continued to serve all traffic the CSS output of our existing build process and kept the new build flagged off for production users.  This allowed us to safely run a dress rehearsal of the new conversion and build systems during deployments and monitor the system for failures.


Once the SCSS builda process ran for several days (with 25-50 pushes per day) without incident, we used our feature flagging infrastructure to ramp up 50% of our production users to use the new SCSS builda output. We monitored graphs for issues.

SCSSatScale_4 After several days at 50%, we ramped up SCSS builda output to 100% of users and continued to monitor graphs.


The final step was to take a few hours to hold production pushes and convert our CSS source to the converted SCSS.  Since our SCSS builda process generated its own cleaned SCSS, transitioning our source was as simple as replacing the contents of our css/ directory with those generated SCSS files.

One 1.2M-line deployment later, was running entirely on SCSS source code.

Optimizing for Developer Productivity and Code Maintainability

We knew that integrating a new technology into the stack such as SCSS would require up-front work on our end with regards to communication, teaching and developer tools. Beyond just the work related to the build pipeline it was important to make sure developers felt confident writing SCSS from day one.

Communication and Teaching

The style guide and product work by the initial SCSS design team was key in showing the value of adopting SCSS to others throughout the organization. The speed at which new, consistent and beautiful pages could be created with the new style guide was impressive.  We worked with the designers closely on email communication and lunch-and-learn sessions before the official SCSS launch day and crafted documentation within our internal wiki.

Developer Tools and Maintainability

Beyond syntax differences, there are a couple of core pitfalls/pain points for developers when using SCSS:

  1. SCSS is compiled, so syntax errors explode compilation and no CSS hits the page.
  2. You can accidentally bloat your CSS by performing seemingly harmless operations (here’s looking at you, @extend).
  3. Nested @import’s within SCSS files can complicate tracing the source files for specific selectors.

We found the best way to remedy both was to integrate feedback into development environments.

For broken SCSS, a missing/non-compiling CSS file becomes an error message at the top of the document:


For maintainability, the integration of a live, in-browser SCSS lint was invaluable:


The lint rules defined by our designers help keep our SCSS error-free and consistent and are used within both our pre-deployment test suites and in-browser lint.  Luckily the fantastic open source project scss-lint has a variety of configurable lint rules right out of the box.

Lastly, due to nested SCSS file structures, source maps for inspecting file dependencies in browser developer tools were a must. These were straightforward to implement since libsass provides source map support.

With the SCSS build processes, live lint, source maps, test suite upgrades and education around the new style guide, our final internal conversion step was pushing environment updates to all developer VMs. Similarly to the SCSS production pipeline the developer environments involved rigorous testing and iteration, and gathering feedback from an opt-in developer test group was key before rolling out the tooling to the entire company.

Conclusions and Considerations

The key to making any sweeping change within a complex system is building confidence, and transitioning from CSS to SCSS was no different.  We had to be confident that our cleaning process wouldn’t produce SCSS that broke our site, and we had to be confident that we built the right tools to keep our SCSS clean and maintainable moving forward. With proper education, tooling and sanity checks throughout the process, we were able to move Etsy to SCSS with minimal disruption to developer workflows or production users.

  1. We use SCSS to refer the CSS-syntax version of Sass. For all intents and purposes, SCSS and Sass are interchangeable throughout this post.
  2. In order to maintain our old system’s build behavior and prevent redundant CSS imports, we forked libsass to support compass-style import-once behavior.
  3. Graphics from Daniel Espeset – Making Maps: The Role of Frontend Infrastructure at Etsy – Fronteers 2014 Presentation (

You can follow Dan on Twitter at @dxna.


Announcing Hound: A Lightning Fast Code Search Tool

Posted by and on January 27, 2015 / 31 Comments

Today we are open sourcing a new tool to help you search large, complex codebases at lightning speed. We are calling this tool Hound. We’ve been using it internally for a few months, and it has become an indispensable tool that many engineers use every day.

The Problem Hound Solves

Before Hound, most engineers used ack or grep to search code, but with our growing codebase this started taking longer and longer. Even worse, searching across multiple repositories was so slow and cumbersome that it was becoming frustrating and error prone. Since it is easy to overlook dependencies between repositories, and hard to even know which you might need to search, this was a big problem. Searching multiple repositories was especially painful if you wanted to search a repo that you didn’t have cloned on your machine. Due to this frustration, Kelly starting working on a side project to try out some of the ideas and code from this article by Russ Cox. The code was robust, but we wanted to tweak some of the criteria for excluding files, and create a web UI that could talk to the search engine.

The end goal was to provide a simple web front-end with linkable search results that could perform regular expression searches on all of our repositories quickly and accurately. Hound accomplishes this with a static React front-end that talks to a Go backend. The backend keeps an up-to-date index for each repository and answers searches through a minimal API.

Why Create a New Tool?

Some people might point out that tools like this already exist – OpenGrok comes to mind immediately. Our main beef with OpenGrok is that it is difficult to deploy, and it has a number of hard requirements that are not trivial to install and configure. To run Hound you just need Go 1.3+. That’s it. A browser helps to see the web UI, but with a command line version on the way and a Sublime Text plugin already live, the browser is optional. We wanted to lower the barrier to entry for this tool so much that anyone, anywhere, could starting using Hound to search their code in seconds or minutes, not hours.

Get It

Hound is easy enough to use that we recommend you just clone it and see for yourself. We have committed to using the open source version of Hound internally, so we hope to address issues and pull requests quickly. That’s enough jibber jabber – check out the quick start guide and get searching!


Hound trained at Etsy by  Jonathan Klein and Kelly Norton


Introducing statsd-jvm-profiler: A JVM Profiler for Hadoop

Posted by on January 14, 2015 / 13 Comments

At Etsy we run thousands of Hadoop jobs over hundreds of terabytes of data every day.  When operating at this scale optimizing jobs is vital: we need to make sure that users get the results they need quickly, while also ensuring we use our cluster’s resources efficiently.  Actually doing that optimizing is the hard part, however.  To make accurate decisions you need measurements, and so we have created statsd-jvm-profiler: a JVM profiler that sends the profiling data to StatsD.

Why Create a New Profiler?

There are already many profilers for the JVM, including VisualVM, YourKit, and hprof.  Why do we need another one?  Those profilers are all excellent tools, and statsd-jvm-profiler is not intended to entirely supplant them.  Instead, statsd-jvm-profiler, inspired by riemann-jvm-profiler, is designed for a specific use-case: quickly and easily profiling Hadoop jobs.

Profiling Hadoop jobs is a complex process.  Each map and reduce task gets a separate JVM, so one job could have hundreds or even thousands of distinct JVMs, running across the many nodes of the Hadoop cluster.  Using frameworks like Scalding complicates it further: one Scalding job will run multiple Hadoop jobs, each with many distinct JVMs.  As such it is not trivial to determine exactly where the code you want to profile is running.  Moreover, storing and transferring the snapshot files produced by some profilers has also been problematic for us due to the large size of the snapshots.  Finally, at Etsy we want our big data stack to be accessible to as many people as possible, and this includes tools for optimizing jobs.  StatsD and Graphite are used extensively throughout Etsy, so by sending data to StatsD, statsd-jvm-profiler enables users to use tools they are already familiar with to explore the profiling data.

Writing the Profiler

For simplicity, we chose to write statsd-jvm-profiler is a Java agent, which means it runs in the same JVM as the process being instrumented.  The agent code runs before the main method of that process.  Implementing an agent is straightforward: define a class that has a premain method with this signature:

package com.etsy.agent;

import java.lang.instrument.Instrumentation;

public class ExampleAgent {
    public static void premain(String args, Instrumentation instrumentation) {
        // Agent code here

The agent class should be packaged in a JAR whose manifest specifies the Premain-Class attribute:

Premain-Class: com.etsy.agent.ExampleAgent

We are using Maven to build statsd-jvm-profiler, so we use the maven-shade-plugin’s ManifestResourceTransformer to set this property, but other build tools have similar facilities.

Finally, we used the JVM’s management interface to actually obtain the profiling data. provides a number of MXBeans that expose information about various components of the JVM, including memory usage, the garbage collector, and running threads.  By pushing this data to StatsD, statsd-jvm-profiler removes the need to worry about where the code is running – all the metrics are available in a central location.


There were some issues that came up as we developed statsd-jvm-profiler.  First, statsd-jvm-profiler uses a ScheduledExecutorService to periodically run the threads that actually perform the profiling.  However, the default ScheduledExecutorService runs as a non-daemon thread, which means it will keep the JVM alive, even though the main thread may have exited.  This is not ideal for a profiler, as it will keep the JVM alive and continue to report profiling data even though nothing is happening other than the profiler.  Guava has functionality to create a ScheduledExecutorService that will exit when the application is complete, which statsd-jvm-profiler uses to work around this issue. 

Safepoints are another interesting aspect of profiling the JVM.  A thread is at a safepoint when it is in a known state: all roots for garbage collection are known and all heap contents are consistent.  At a safepoint, a thread’s state can be safely observed or manipulated by other threads.  Garbage collection must occur at a safepoint, but a safepoint is also required to sample the thread state like statsd-jvm-profiler does.  However, the JVM can optimize safepoints out of hot methods.  As such, statsd-jvm-profiler’s sampling can be biased towards cold methods.  This is not a problem unique to statsd-jvm-profiler – any profiler that samples the thread state like statsd-jvm-profiler does would have the same bias.  In practice this bias may not be that meaningful.  It is important to be aware of, but an incomplete view of application performance that still enables you to make improvements is better than no information.

How to Use statsd-jvm-profiler

statsd-jvm-profiler will profile heap and non-heap memory usage, garbage collection, and the aggregate time spent executing each function.  You will need the statsd-jvm-profiler jar on the host where the JVM you want to profile will run.  Since statsd-jvm-profiler is a Java agent, it is enabled with the -javaagent argument to the JVM.  You are required to provide the hostname and port number for the StatsD instance to which statsd-jvm-profiler should send metrics.  You can also optionally specify a prefix for the metrics emitted by statsd-jvm-profiler as well as filters for the functions to profile. 


An example of using statsd-jvm-profiler to profile Scalding jobs is provided with the code.

statsd-jvm-profiler will output metrics under the “statsd-jvm-profiler” prefix by default, or you can specify a custom prefix.  Once the application being profiled has finished, all of the data statsd-jvm-profiler produced will be available in whatever backend you are using with StatsD.  What do you do with all that data? Graph it!  We have found flame graphs to be a useful method of visualizing the CPU profiling data, and a script to output data from Graphite into a format suitable for generating a flame graph is included with statsd-jvm-profiler:

Example Flame Graph

The memory usage and garbage collection metrics can be visualized directly:

Example Metrics

Using the Profiler’s Results

We’ve already used the data from statsd-jvm-profiler to determine how best to optimize jobs.  For example, we wanted to profile a job after some changes that had made it slower.  The flame graph made it obvious where the job was spending its time.  The wide bars on the left and right of this image are from data serialization/deserialization.  As such we knew that speeding up the job would come from improving the serialization or reducing the amount of data being moved around – not in optimizing the logic of the job itself. 

Flame Graph with Serialization

We also made a serendipitous discovery while profiling that job: it had been given 3 Gb of heap, but it was not using anywhere near that much.  As such we could reduce its heap size.  Such chance findings are a great advantage of making profiling simple.  You are more likely to to make these chance discoveries if you profile often and make analysis of your profiling data easier.  statsd-jvm-profiler and Graphite solve this problem for us.

Get statsd-jvm-profiler

Want to try it out yourself?  statsd-jvm-profiler is available on Github now!


Q3 2014 Site Performance Report

Posted by , and on December 22, 2014 / 6 Comments

We’re well into the fourth quarter of 2014, and it’s time to update you on how we did in Q3. This is either a really late performance report or an early Christmas present – you decide! The short version is that server side performance improved due to infrastructure changes, while front-end performance got worse because of third party content and increased amounts of CSS/JS across the site. In most cases the additional CSS/JS came from redesigns and new features that were part of winning variants that we tested on site. We’ve also learned that we need some better front-end monitoring and analysis tools to detect regressions and figure out what caused them.

We are also trying something new for this post – the entire performance team is involved in writing it! Allison McKnight is authoring the server side performance section, Natalya Hoota is updating us on the real user monitoring numbers, and Jonathan Klein is doing synthetic monitoring, this introduction, and the conclusion. Let’s get right to it – enter Allison:

Server Side Performance – From Allison McKnight

Here are median and 95th percentile times for signed-in users on October 22nd:



The homepage saw a significant jump in median load times as well as an increase in 95th percentile load times. This is due to our launch of the new signed-in homepage in September. The new homepage shows the activity feed of the signed-in user and requires more back-end time to gather that data. While the new personalization on the homepage did create a performance regression, the redesign had a significant increase in business and engagement metrics. The redesign also features items from a more diverse set of shops on the homepage and displays content that is more relevant to our users. The new version of the homepage is a better user experience, and we feel that this justifies the increase in load time. We are planning to revisit the homepage to further tune its performance in the beginning of 2015.

The rest of the pages saw a nice decrease in both median and 95th percentile back-end times. This is due to a rollout of some new hardware – we upgraded our production memcached boxes.

We use memcached for a lot of things at Etsy. In addition to caching the results of queries to the database, we use memcached to cache things like a user’s searches or activity feed. If you read our last site performance report, you may remember that in June we rolled out a dedicated memcached cluster for our listing cards, landing a 100% cache hit rate for listing data along with a nice drop in search page server side times.

Our old memcached boxes had 1Gb networking and 48GB of memory. We traded them out for new machines with 10Gb networking, 128GB of memory, faster clocks, and a newer architecture. After the switch we saw a drop in memcached latency. This resulted in a visible drop in both the median and 95th percentile back-end times of our pages, especially during our peak traffic times. This graph compares the profile page’s median and 95th percentile server side times on the day after the new hardware was installed and on the same day a week before the switch:


All of the pages in this report saw a similar speedup.

At this point, you may be wondering why the search page has seen such a small performance improvement since July if it was also affected by the new memcached machines. In fact, the search page saw about the same speedup from the new boxes as the rest of the pages did, but this speedup was balanced out by other changes to the page throughout the past three months. A lot of work has been going on with the search page recently, and it has proven hard to track down any specific changes that have caused regressions in performance. All told, in the past three months we’ve seen a small improvement in the search page’s performance and many changes and improvements to the page from the Search team. Since this is one of our faster pages, we’re happy with a small performance improvement along with the progress we’ve made towards our business goals in the last quarter.

Synthetic Front-End Performance – From Jonathan Klein

As a reminder, these tests are run with Catchpoint. They use IE9, and they run from New York, London, Chicago, Seattle, and Miami every 30 minutes (so 240 runs per page per day). The “Webpage Response” metric is defined as the time it took from the request being issued to receiving the last byte of the final element on the page. These numbers are medians, and here is the data for October 15th, 2014.


Unfortunately load time increased across all pages, with a larger increase on the search page. The overall increase is due to some additional assets that are served on every page. Specifically we added some CSS/JS for header/footer changes that we are experimenting with, as well as some new third party assets.

The larger increase on the search page is due to a redesign of that page, which increased the overall amount of CSS/JS on the page significantly. This pushed up both start render (extra CSS) and Webpage Response (additional page weight, more JS, etc.).

Seeing an increase of 100ms across the board isn’t a huge cause for concern, but the large increase on search warrants a closer look. Our search team is looking into ways to retain the new functionality while reducing the amount of CSS/JS that we have to serve.

Real User Front-End Performance – From Natalya Hoota

So, what are our users experiencing? Let us look at the RUM (Real User Monitoring) data collected by mPulse. First, median and 95% total page load time, measured in seconds.


As you see, real user data in Q3 showed an overall, and quite significant, performance regression on all pages.

Page load time can be viewed as a sum of back-end time (time until the browser receives the first byte of the response) and front-end time (time from the first byte until the page is finished rendering). Since we had a significant discrepancy between our RUM and synthetic data, we need a more detailed analysis.

Back-End Time – RUM Data

Our RUM metrics for server-side performance, defined as time to the first byte in mPulse, did not show a significant change since Q2, which matched closely with both our server-side performance analysis and synthetic data. The differences are small enough to be almost within rounding error, so there was effectively no regression.


Front-End Time – RUM Data

If synthetic data showed a slight increase in load time, our real user numbers showed this regression amplified. It could be a number of things: internal changes such as our experiments with header and footer, increased number of assets and their handling on clients side, or external factors. To figure out the reasons behind our RUM numbers, we asked ourselves a number of questions:

Q. Was our Q3 RUM data significantly larger than Q2 due to an increase in site traffic?
A. No. We found no immediate relation between performance degradation and a change in beacon volume. We double-checked with our internally recorded traffic patterns and found that beacon count is consistent with it, so we crossed this hypothesis off the list.

Q. Has our user base breakdown changed?
A. Not according to the RUM data breakdown. We did a detailed breakdown of the dataset by region and operating system and, yet again, found that our traffic patterns did not change geographically. It also appears that we had more RUM signals coming from desktop in Q3 than in Q2.

What remains is taking a closer look at internal factors.

One clue was our profile page metrics. From the design perspective, the only change on that page was global header / footer experiments. For the profile page alone, those resulted in a 100% and 45% increase in the number of page CSS files and JS assets, respectively. The profile page suffered one of the highest increases in median and 95% load time.

We already know that global header and footer experiments, along with additional CSS assets served, affected all pages to some degree. It is possible that in other pages the change was balanced out by architecture and design improvements, while in profile page we are seeing an isolated impact of this change.

To prove or dismiss this, we’ve learned that we need better analysis tooling than we currently have. Our synthetic tests run only on signed-out pages, hence it would be more accurate to compare them to similar RUM data set. However, we are currently unable to filter signed-in and signed-out RUM user data per page. We are planning to add this feature early in 2015, which will give us the ability to better connect a particular performance regression to a page experiment that caused it.

Lastly, for much of the quarter we were experiencing a persistent issue where assets failed to be served in a gzipped format from one of our CDNs. This amplified the impact of the asset growth, causing a significant performance regression in the front-end timing. This issue has been resolved, but it was present on the day when the data for this report was pulled.

Conclusion – From Jonathan Klein

The summary of this report is “faster back-end, slower front-end”. As expected the memcached upgrade was a boon for server side numbers, and the continued trend of larger web pages and more requests didn’t miss Etsy this quarter. These results are unsurprising – server hardware continues to get faster and we can reap the benefits of these upgrades. On the front-end, we have to keep pushing the envelope from a feature and design point of view, and our users respond positively to these changes, even if our pages don’t respond quite as quickly. We also learned this quarter that it’s time to build better front-end performance monitoring and data analysis tools for our site, and we’ll be focusing on this early in 2015.

The continuing challenge for us is to deliver the feature rich, beautiful experiences that our users enjoy while still keeping load time in check. We have a few projects in the new year that will hopefully help here: testing HTTP/2, a more nuanced approach to responsive images, and some CSS/JS/font cleanup.


We Invite Everyone at Etsy to Do an Engineering Rotation: Here’s why

Posted by on December 22, 2014 / 16 Comments

At Etsy, it’s not just engineers who write and deploy code – our designers and product managers regularly do too. And now any Etsy employee can sign up for an “engineering rotation” to get a crash course in how Etsy codes, and ultimately work with an engineer to write and deploy the code that adds their photo to our about page. In the past year, 70 employees have completed engineering rotations. Our engineers have been pushing on day one for a while now, but it took a bit more work to get non-coders prepared to push as soon as their second week. In this post I’ll explain why we started engineering rotations and what an entire rotation entails.

What are rotations and why are they important?

Since 2010, Etsy employees have participated in “support rotations” every quarter, where they spend about two hours replying to support requests from our members. Even our CEO participates. What started as a way to help our Member Operations team during their busiest time of year has evolved into a program that facilitates cross-team communication, builds company-wide empathy, and provides no shortage of user insights or fun!

This got us thinking about starting up engineering rotations, where people outside of the engineering organization spend some time learning how the team works and doing some engineering tasks. Armed with our excellent continuous deployment tools, we put together a program that could have an employee with no technical knowledge deploying code live to the website in three hours. This includes time spent training and the deploy itself.

The Engineering Rotation Program

The program is split into three parts: homework; an in-person class; then hands-on deployment. The code that participants change and deploy will add their photos to the Etsy about page. It’s a nice visual payoff, and lets new hires publicly declare themselves part of the Etsy team.

Before class begins, we assign homework in order to prepare participants for the code change they’ll deploy. We ask them to complete interactive tutorials, including HTML levels 1 and 2 on Code Academy and our in-house Unix Command Line 101. We also ask them to read Marc Cohen’s excellent article “How the Web Works – In One Easy Lesson,” followed by a blog post on discussing how the engineering organization deals with outages. These resources help familiarize each participant with the technologies they’ll work with, and introduce them to some of Etsy’s core engineering tenets, such as blameless post-mortems.


Next up is a class for all participants. It has five sections. The first section picks up where How The Web Works left off and explains how Etsy works. Then we introduce the standard three-tier architecture and walk through some example requests: viewing, creating, searching for and purchasing a listing. Next we take a deep-dive into database sharding. We explain what it is, why it’s necessary, why we shard by data owner and how we rebalance our shards. We then explain Content Delivery Networks and why we use them. After that, we move away from the hard technical discussion to talk about continuous deployment. We discuss the philosophy behind it, and describe why it’s safe to change the website fifty times per day and how we ensure that each change does exactly what we expect. We wrap up this session by giving an overview of all the engineering teams at Etsy and their responsibilities.

At this point we pair each participant with an engineer who will guide them through the process of making and testing the code change, and ultimately pressing the big green Deploy to Production button. These one-on-one sessions can take up to two hours as the pair discuss the different tools that exist at each step – some as simple as IRC or as complex as our Continuous Integration cluster. As the participant begins the process of deploying their code change, they’ll see their name appear atop our dashboards.

What have we learned?

The benefits of engineering rotations parallel those of our support rotations in many ways. It’s an opportunity for Admin throughout Etsy to work with people they normally wouldn’t, and to learn more about each other personally and professionally. To an outsider, the more technical aspects of Etsy might feel unapproachable – even a bit mysterious – but demystifying them encourages even more collaboration. Here’s what some of the participants of Etsy’s Engineering Rotations have said:

“Understanding the work of your colleagues breeds empathy and it’s been great having a better understanding of what working at Etsy means to others.”

“The best part about this program is that it pulls back the curtain on how software development works at Etsy. For people who don’t work with code every day, this can appear to be some sort of magic, which it’s not – it’s just a different kind of work, with different kinds of tools. Without this program, we would miss out on a huge opportunity for different groups to empathize with each other, which I think is crucial for a company to feel like a real team.”

“The internet goes under the sea in a cable. Whoa.”

Participants in both the engineering and support rotations come away with many lessons beyond the curriculum. More than a few times, support rotations have exposed engineers to parts of the site that generate lots of inquiries, and they were able to fix them immediately. And in one engineering rotation, someone pointed out that a lot of IRC tools we’ve built can’t be used by all employees because they don’t have access to our internal code snippet service. So we’re now looking at how we can give everyone that access. I led one session with our International Counsel, and we ended up having a fascinating discussion about the legality of deleting data. That sprang from my explanation of how we do database migrations!

But perhaps the biggest thing we’ve learned from the engineering rotations is that everyone involved likes doing them. They get to meet new people, learn new things, and use a tool called “the Deployinator.” What’s not to like?

You can follow Dan on Twitter @jazzdan.


Make Performance Part of Your Workflow

Posted by on December 11, 2014 / 1 Comment

Designing for PerformanceThe following is an excerpt from Chapter 7, “Weighing Aesthetics and Performance”, from Designing for Performance by Lara Callender Hogan (Etsy’s Senior Engineering Manager of Performance), which has just been released by O’Reilly.

One way to minimize the operational cost of performance work is to incorporate it into your daily workflow by implementing tools and developing a routine of benchmarking performance.

There are a variety of tools mentioned throughout this book that you can incorporate into your daily development workflow:

By making performance work part of your daily routine and automating as much as possible, you’ll be able to minimize the operational costs of this work over time. Your familiarity with tools will increase, the habits you create will allow you to optimize even faster, and you’ll have more time to work on new things and teach others how to do performance right.

Your long-term routine should include performance as well. Continually benchmark improvements and any resulting performance gains as part of your project cycle so you can defend the cost of performance work in the future. Find opportunities to repurpose existing design patterns and document them. As your users grow up, so does modern browser technology; routinely check in on your browser-specific stylesheets, hacks, and other outdated techniques to see what you can clean up. All of this work will minimize the operational costs of performance work over time and allow you to find more ways to balance aesthetics and performance.

Approach New Designs with a Performance Budget

One key to making decisions when weighing aesthetics and page speed is understanding what wiggle room you have. By creating a performance budget early on, you can make performance sacrifices in one area of a page and make up for them in another. In Table 7-3 I’ve illustrated a few measurable performance goals for a site.

TABLE 7-3. Example performance budget

Total page load time 2 seconds WebPagetest, median from five runs on 3G All pages
Total page load time 2 seconds Real user monitoring tool, median across geographies All pages
Total page weight 800 Kb WebPagetest All pages
Speed Index 1,000 WebPagetest using Dulles location in Chrome on 3G All pages except home page
Speed Index 600 WebPagetest using Dulles location in Chrome on 3G Home page

You can favor aesthetics in one area and favor performance in another by defining your budget up front. That way, it’s not always about making choices that favor page speed; you have an opportunity to favor more complex graphics, for example, if you can find page speed wins elsewhere that keep you within your budget. You can call a few more font weights because you found equivalent savings by removing some image requests. You can negotiate killing a marketing tracking script in order to add a better hero image. By routinely measuring how your site performs against your goals, you can continue to find that balance.

To decide on what your performance goals will be, you can conduct a competitive analysis. See how your competitors are performing and make sure your budget is well below their results. You can also use industry standards for your budget: aim for two seconds or less total page time, as you know that’s how fast users expect sites to load.

Iterate upon your budget as you start getting better at performance and as industry standards change. Continue to push yourself and your team to make the site even faster. If you have a responsively designed site, determine a budget for your breakpoints as well, like we did in Chapter 5.

Your outlined performance goals should always be measureable. Be sure to detail the specific number to beat, the tool you’ll use to measure it, as well as any details of what or whom you’re measuring. Read more about how to measure performance in Chapter 6, and make it easy for anyone on your team to learn about this budget and measure his or her work against it.

Designing for Performance by Lara Callender Hogan
ISBN 978-1-4919-0251-6
Copyright 2014 O’Reilly Media, Inc. All right reserved. Used with permission.

1 Comment