Web Experimentation with New Visitors

Posted by Diego Alonso | Filed under data, engineering

We strive to build Etsy with science, and therefore love how web experimentation and A/B testing help us drive our product development process. Several months ago we started a series of web experiments in order to improve Etsy’s homepage experience for first-time visitors. Testing against a specific population, like first-time visitors, allowed us to find issues and improve our variants without raising concerns in our community. This is how the page used to look for new visitors:


We established both qualitative and quantitative goals to measure improvements for the redesign. On the qualitative side, our main goal was to successfully communicate to new buyers that Etsy is a global marketplace made by people. On the quantitative side, we primarily cared about three metrics: bounce rate, conversion rate, and retention over time. Our aim was to reduce bounce rate (percentage of visits who leave the site after viewing the homepage) without affecting conversion rate (proportion of visits that resulted in a purchase) and visit frequency. After conducting user surveys, usability tests, and analyzing our target web metrics, we have finally reached those goals and launched a better homepage for new visitors. Here’s what the new homepage looks like:


Bucketing New Visitors

This series of web experiments marked the first time at Etsy where we tried to consistently run an experiment only for first-time visitors over a period of time. While identifying a new visitor is relatively straightforward, the logic to present that user with the same experience on subsequent visits is something less trivial.

Bucketing a Visitor

At Etsy we use our open source Feature API for A/B testing. Every visitor is assigned a unique ID when they arrive to the website for the first time. In order to determine in which bucket of a test the visitor belongs to, we generate a deterministic hash using the visitor’s unique ID and the experiment identifier. The main advantage of using this hash for bucketing is that we don’t have to worry about creating or managing multiple cookies every time we bucket a visitor into an experiment.

Identifying New Visitors

One simple way to identify a new visitor is by the absence of etsy.com cookies in the browser. On our first set of experiments we checked for the existence of the __utma cookie from Google Analytics, which we also used to define visits in our internal analytics stack.

Returning New Visitors

Before we define a returning new visitor, we need first to describe the concept of a visit. We use the Google Analytics visit definition, where a visit is a group of user interactions on our website within a given time frame. One visitor can produce multiple visits on the same day, or over the following days, weeks, or months. In a web experiment, the difference between a returning visitor and a returning new visitor is the relationship between the experiment start time and the visitor’s first landing time on the website. To put it simply, every visitor who landed on the website for the first time after the experiment start date will be treated as a new visitor, and will consistently see the same test variant on their first and subsequent visits.

As I mentioned before, we used the __utma cookie to identify visitors. One advantage of this cookie is that it tracks the first time a visitor landed on the website. Since we have access to the first visit start time and the experiment start time, we can determine if a visitor is eligible to see an experiment variant. In the following diagram we show two visitors and their relation with the experiment start time.


Feature API

We added the logic to compare a visitor’s first landing time against an experiment start time as part of our internal Feature API. This way it’s really simple to set up web experiments targeting new visitors. Here is an example of how we set up an experiment configuration and an API entry point.

Configuration Set-up

$server_config['new_homepage'] => [
   'enabled' => 50,
   'eligibility' => [
       'first_visit' => [
           'after' => TIMESTAMP

API Entry Point

if (Feature::isEnabled('new_homepage')) {
   $controller = new Homepage_Controller();

Unforeseen Events

When we first started analyzing the test results, we found that more than 10% of the visitors in the experiment had first visit landing times prior to our experiment start day. This suggested that old, seasoned Etsy users were being bucketed into this experiment. After investigating, we were able to correlate those visits to a specific browser: Safari 4+. The visits were a result of the browser making requests to generate thumbnail images for the Top Sites feature. These type of requests are generated any time a user is on the browser, even without visiting Etsy. On the web analytics side, this created a visit with a homepage view followed by an exit event. Fortunately, Safari provides a way to identify these requests using the additional HTTP header “X-Purpose: preview”. Finally, after filtering these requests, we were able to correct this anomaly in our data. Below you can see the experiment’s bounce rates significantly decreased after getting rid of these automated visits.


Although verifying the existence of cookies to determine whether a visitor is new may seem trivial, it is hard to be completely certain that a visitor has never been to your website before based on this signal alone. One person can use multiple browsers and devices to view the same website: mobile, tablet, work or personal computer, or even borrow any other device from a friend. Here is when more deep analysis can come in handy, like filtering visits using attributes such as user registration and signed-in events.


We are confident that web experimentation with new visitors is a good way to collect unbiased results and to reduce product development concerns such as disrupting existing users’ experiences with experimental features. Overall, this approach allows us to drive change. Going forward, we will use what we learned from these experiments as we develop new iterations of the homepage for other subsets of our members. Now that all the preparatory work is done, we can ramp-up this experiment, for instance, to all signed-out visitors.

You can follow Diego on Twitter at @gofordiego

1 response

Responsive emails that really work

Posted by kevingessner | Filed under engineering, mobile

If you’ve ever written an HTML email, you’ll know that the state of the art is like coding for the web 15 years ago: tables and inline styles are the go-to techniques, CSS support is laughably incomplete, and your options for layout have none of the flexibility that you get on the “real web”.

Just like everywhere online, more and more people are using mobile devices to read their email.  At Etsy, more than half of our email opens happen on a mobile device!  Our desktop-oriented, fixed-width designs are beautiful, but mobile readers aren’t getting the best experience.

We want our emails to provide a great experience for everyone, so we’re experimenting with new designs that work on every client: iPhone, iPad, Android, Gmail.com, Outlook, Yahoo Mail, and more.  But given the sorry state of CSS and HTML for email, how can we make an email look great in all those places?

Thanks to one well-informed blog commenter and tons of testing across many devices we’ve found a way to make HTML emails that work everywhere.  You get a mobile-oriented design on phones, a desktop layout on Gmail, and a responsive, fluid design for tablets and desktop clients.  It’s the best of all worlds—even on clients that don’t support media queries.

A New Scaffolding

I’m going to walk you through creating a simple design that showcases this new way of designing HTML emails.  It’s a two-column layout that wraps to a single column on mobile:


For modern browsers, this would be an easy layout to implement—frameworks like Bootstrap provide layouts like this right out of the box.  But the limitations of HTML email make even this simple layout a challenge.

Client Limitations

What limitations are we up against?

  • Android’s Gmail.app only supports inline CSS in HTML style attributes—no <style> tags, no media queries, and no external stylesheets.
  • Gmail.com supports a limited subset of HTML, stripping out many tags (including all of the tags in HTML5) and attributes (including classes and IDs), and only allows some inline CSS and a very limited subset in <style> tags (more on this later).
  • The iOS and Mac OS X Mail apps support media queries and a large selection of CSS.  Of all clients, these are most like a modern browser.

On to the Code

Let’s start with a simple HTML structure.

   <table cellpadding=0 cellspacing=0><tr><td>
     <table cellpadding=0 cellspacing=0><tr><td>
         <h2>Main Content</h2>
         <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec gravida sem dictum, iaculis magna ornare, dignissim elit.</p>
         <p>Donec tincidunt tincidunt nunc, eget pulvinar risus sodales eu.</p>

It’s straightforward: a header and footer with two content areas between, main content and a sidebar.  No fancy tags, just divs and tables and paragraphs—we’re still partying like it’s 1999.  (As we apply styling, we’ll see why both wrapping tables are required.)

Initial Styling

Android is the least common denominator of CSS support, allowing only inline CSS in style attributes and ignoring all other styles.  So let’s add inline CSS that gives us a mobile-friendly layout of a fluid single column:

 <body style="margin: 0; padding: 0; background: #ccc;">
   <table cellpadding=0 cellspacing=0 style="width: 100%;"><tr><td style="padding: 12px 2%;">
     <table cellpadding=0 cellspacing=0 style="margin: 0 auto; background: #fff; width: 96%;"><tr><td style="padding: 12px 2%;">
       <h2 style="margin-top: 0;">Main Content</h2>
       <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec gravida sem dictum, iaculis magna ornare, dignissim elit.</p>
       <h2 style="margin-top: 0;">Sidebar</h2>
       <p>Donec tincidunt tincidunt nunc, eget pulvinar risus sodales eu.</p>
     <div style="border-top: solid 1px #ccc;">

It honestly doesn’t look that different from the unstyled HTML (but the groundwork is there for your beautiful ideas!).  The table-within-a-table wrapping all the content lets us have our content area in a colored background, with a small (2%) gutter on each side.  Don’t forget the cellspacing and cellpadding attributes, too, or you’ll get extra spacing that can’t be removed with CSS!

Screen Shot 2014-03-12 at 12.03.47 PM.png

Dealing with Gmail

This design is certainly adequate for both mobile and desktop clients, but it’s not the best we can do.  Desktop clients and large tablets have a lot of screen real estate that we’re wasting.

Our main target here is Gmail—desktop and laptop screens keep getting bigger, and we want Gmail users to get a full-width experience.  But Gmail doesn’t support media queries, the go-to way of showing different layouts on different-sized clients.  What can we do?

I mentioned earlier that Gmail supports a small subset of CSS inside <style> tags.  This is not a widely known feature of Gmail—most resources you’ll find tell you that Gmail only supports inline styles.  Only a handful of blog comments and forum posts mention this support.  I don’t know when Gmail’s CSS support was quietly improved, but I was certainly pleased to learn about this new way of styling my emails.

The subset of CSS that Gmail supports is that you are limited to only using tag name selectors—no classes or IDs are supported.  Coupled with Gmail’s limited whitelist of HTML elements, your flexibility in styling different parts of your email differently is severely limited.  Plus, the <style> tag must be in the <head> of your email, not in the <body>.

The trick is to make judicious use of CSS’s structural selectors: the descendant, adjacent, and child selectors.  By carefully structuring your HTML and mixing and matching these selectors, you can pinpoint elements for providing styles.  Here are the styles I’ve applied to show a two-column layout in Gmail:

          <style type="text/css">
/*  1 */    table table {
/*  2 */      width: 600px !important;
/*  3 */    }
/*  4 */    table div + div { /* main content */
/*  5 */      width: 65%;
/*  6 */      float: left;
/*  7 */    }
/*  8 */    table div + div + div { /* sidebar */
/*  9 */      width: 33%;
/* 10 */      float: right;
/* 11 */    }
/* 12 */    table div + div + div + div { /* footer */
/* 13 */      width: 100%;
/* 14 */      float: none;
/* 15 */      clear: both;
/* 16 */    }

In the absence of classes and IDs to tell you what elements are being styled, comments are your friend!  Let’s walk through each of these selectors.

Lines 1-3 lock our layout to a fixed width.  Remember, this is our style for Gmail on the desktop, where a fluid design isn’t our goal.  We apply this to the inner wrapping table so that padding on the outer one remains, around our 600-pixel-wide content.  Without having both tables, we’d lose the padding that keeps our content from running into the client’s UI.

Next, we style the main content.  The selector on line 4, reading right to left, finds a div immediately following another div, inside a table.  That actually matches the main content, sidebar, and footer divs, but that’s OK for now.  We style it to take up the left two thirds of the content area (minus a small gutter between the columns).

The selector on line 8 styles the sidebar, by finding a div following a div following a div, inside a table. This selects both the footer and the sidebar, but not the main content, and overrides the preceding styles, placing the sidebar in the right-hand third of the content.

Finally, we select the footer on line 12—the only div that follows three others—and make it full-width.  Since the proceeding selectors and styles also applied to this footer div, we need to reset the float style back to none (on line 14).

With that, we have a two-column fixed layout for Gmail, without breaking the one-column view for Android:

Screen Shot 2014-03-12 at 11.30.41 AM.png

The styles we applied to the outer wrapping table keep our content centered, and the other inline styles that we didn’t override (such as the line above the footer) are still rendered.

For Modern Browsers

Finally, let’s consider Mail.app on iOS and Mac OS X.  I’m lumping them together because they have similar rendering engines and CSS support—the media queries and full CSS selectors you know and love all work.  The styles we applied for Gmail will be also applied on iPhones, giving a mobile-unfriendly fixed-width layout.  We want Android’s single-column fluid layout instead.  We can target modern, small-screen clients (like iPhones) with a media query:

/* Inside <style> in the <head> */
@media (max-width: 630px) {
  table table {
    width: 96% !important;
  table div {
    float: none !important;
    width: 100% !important;

These styles override the earlier ones to restore the single-column layout, but only for devices under 630 pixels wide—the point at which our fixed 600-pixel layout would begin to scroll horizontally.  Don’t forget the !important flag, which makes these styles override the earlier ones.  Gmail.com and Android will both ignore this media query.  iPads and Mail.app on the desktop, which are wider than 630 pixels, will also show the desktop style.

This is admittedly not the prettiest approach. With multiple levels of overriding selectors, you need to think carefully about the impact of any change to your styles.  As your design grows in complexity, you need to keep a handle on which elements will be selected where, particularly with the tag-only selectors for Gmail.  But it’s nearly the holy grail of HTML email: responsive, flexible layouts even in clients with limited support for CSS.

The biggest caveat of this approach (besides the complexity of the code) is the layout on Android tablets: they will display the same single-column layout as Android phones.  For us (and probably for you, too), Android tablets are a vanishingly small percentage of our users.  In any case, the layout isn’t unusable, it’s just not optimal, with wide columns and needlessly large images.

Bringing it All Together

You can find the complete code for this example in this gist: https://gist.github.com/kevingessner/9509148

You can extend this approach to build all kinds of complex layouts.  Just keep in mind the three places where every item might be styled: inline CSS, tag-only selectors in a <style> tag, and one or more media query blocks.  Apply your styles carefully, and don’t forget to test your layout in every client you can your hands on!

I hope that in the future, Gmail on the web and on Android will enter the 21st century and add support for the niceties that CSS has added in recent years.  Until then, creating an HTML email that looks great on every client will continue to be a challenge.  But with a few tricks like these up your sleeve, you can make a beautiful email that gives a great experience for everyone on every client.

You can follow me on Twitter at @kevingessner

Want to help us make Etsy better, from email to accounting? We’re hiring!

No responses

Etsy’s Journey to Continuous Integration for Mobile Apps

Posted by nassimkammah | Filed under engineering, infrastructure, mobile

Positive app reviews can greatly help in user conversion and the image of a brand. on the other hand, bad reviews can have dramatic consequences; as Andy Budd puts it: “Mobile apps live and die by their ratings in an App Store”.


Screen Shot 2014-02-28 at 3.51.58 PM

The above reviews are actual reviews of the Etsy iOS App. As an Etsy developer, it is sad to read them, but it’s a fact: bugs sometimes sneak through our releases. On the web stack, we use our not so secret weapon of Continuous Delivery as a safety net to quickly address bugs that make it to production. However, releasing mobile apps requires a 3rd party’s approval (the app store) , which takes five days on average; once an app is approved, users decide when to upgrade – so they may be stuck with older versions. Based on our analytics data, we currently have 5 iOS and 10 Android versions currently in use by the public.

Through Continuous Integration (CI), we can detect and fix major defects in the development and validation phase of the project, before they negatively impact user experience: this post explores Etsy’s journey to implementing our CI pipeline for our android and iOS applications.

“Every commit should build the mainline on an integration machine”

This fundamental CI principle is the first step to detecting defects as soon as they are introduced: failing to compile. Building your app in an IDE does not count as Continuous Integration. Thankfully, both iOS and Android are command line friendly: building a release of the iOS app is as simple as running:

xcodebuild -scheme "Etsy" archive

Provisioning integration machines

Integration machines are separate from developer machines – they provide a stable, controlled, reproducible environment for builds and tests. Ensuring that all the integration machines are identical is critical – using a provisioning framework to manage all the dependencies is a good solution to ensure uniformity and scalability.

At Etsy, we are pretty fond of Chef to manage our infrastructure – we naturally turned to it to provision our growing Mac Mini fleet. Equipped with the homebrew cookbook for installing packages and rbenv cookbook for managing the ruby environment in a relatively sane way, our sysops wizard Jon Cowie sprinkled a few hdiutil incantations (to manage disk images) and our cookbooks were ready. We are now able to programmatically install 95% of Xcode (some steps are still manual), Git, and all the Android packages required to build and run the tests for our apps.

Lastly, if you ever had to deal with iOS provisioning profiles, you can relate to how annoying they are to manage and keep up to date; having a centralized system that manages all our profiles saves a lot of time and frustration for our engineers.

Building on push and providing daily deploys

With our CI machines hooked up to our Jenkins server, setting up a plan to build the app on every git push is trivial. This simple step helps us detect missing files from commits or compilation issues multiple times a week – developers are notified in IRC or by email and build issues are addressed minutes after being detected. Besides building the app on push, we provide a daily build that any Etsy employee can install on their mobile device – the quintessence of dogfooding. An easy way to encourage our coworkers to install pre-release builds is to nag them when they use the app store version of the app.



iOS devices come in many flavors, with seven different iPads, five iPhones and a few iPods; when it comes to Android, the plethora of devices becomes overwhelming. Even when focusing on the top tier of devices, the goal of CI is to detect defects as soon as they are introduced: we can’t expect our QA team to validate the same features over and over on every push!

Our web stack boasts a pretty extensive collection of test suites and the test driven development culture is palpable. Ultimately, our mobile apps leverage a lot of our web code base to deliver content: data is retrieved from the API and many screens are web views. Most of the core logic of our apps rely on the UI layer – which can be tested with functional tests. As such, our first approach was to focus on some functional tests, given that the API was already tested on the web stack (with unit tests and smoke tests).

Functional tests for mobile apps are not new and the landscape of options is still pretty extensive; in our case, we settled down on Calabash and Cucumber. The friendly format and predefined steps of Cucumber + Calabash allows our QA team to write tests themselves without any assistance from our mobile apps engineers.

To date, our functional tests run on iPad/iPhone iOS 6 and 7 and Android, and cover our Tier 1 features, including:

  • searching for listings and shops
  • registering new accounts
  • purchasing an item with a credit card or a gift card

Because functional tests mimic the steps of an actual user, the tests require that certain assumed resources exist. In the case of the Checkout test, these are the following:

  • a dedicated test buyer account
  • a dedicated seller test account
  • a prepaid credit card associated with the account

Our checkout test then consists of:

  1. signing in to the app with our test buyer account
  2. searching for an item (in the seller test account shop)
  3. adding it to the cart
  4. paying for the item using the prepaid credit card

Once the test is over, an ad-hoc mechanism in our backend triggers an order cancellation and the credit card is refunded.

A great example of our functional tests catching bugs is highlighted in the following screenshot from our iPad app:


Our registration test navigates to this view, and fills out all the visible fields. Additionally, the test cycles through the “Female“, “Male” and “Rather Not Say” options; in this case, the tests failed (since the “male” option was missing).

By running our test suite every time an engineer pushes code, we not only detect bugs as soon as they are introduced, we detect app crashes. Our developers usually test their work on the latest OS version but Jenkins has their back: our tests run simultaneously across different combinations of devices and OS versions.

Testing on physical devices

While our developers enjoy our pretty extensive device lab for manual testing, maintaining a set of devices and constantly running automated tests on them is a logistical nightmare and a full time job. After multiple attempts at developing an in-house solution, we decided to use Appthwack to run our tests on physical devices. We run our tests for every push on a set of dedicated devices, and run nightly regression on a broader range of devices by tapping into Appthwack cloud of devices. This integration is still very recent and we’re still working out some kinks related to testing on physical devices and the challenges of aggregating and reporting test status from over 200 devices.

Reporting: put a dashboard on it

With more than 15 Jenkins jobs to build and run the tests, it can be challenging to quickly surface critical information to the developers. A simple home grown dashboard can go a long way to communicating the current test status across all configurations:

Mobile apps dashboard

Static analysis and code reviews

Automated tests cannot catch all bugs and potential crashes – similar to the web stack, developers rely heavily on code reviews prior to pushing their code. Like all code at Etsy, the apps are stored in Github Enterprise repositories, and code reviews consist of a pull request and an issue associated to it. By using the GitHub pull request builder Jenkins plugin, we are able to systematically trigger a build and do some static analysis (see static analysis with OCLint post ) on the review request and post the results to the Github issue:

pull request with lint results

Infrastructure overview summary

All in all, our current infrastructure looks like the following:Mobile Apps Infrastructure overview

Challenges and next steps

Building our continuous integration infrastructure was strenuous and challenges kept appearing one after another, such as the inability to automate the installation of some software dependencies. Once stable, we always have to keep up with new releases (iOS 7, Mavericks) which tend to break the tests and the test harness. Furthermore, functional tests are flaky by nature, requiring constant care and optimization.

We are currently at a point where our tests and infrastructure are reliable enough to detect app crashes and tier 1 bugs on a regular basis. Our next step, from an infrastructure point of view, is to expand our testing to physical devices via our test provider Appthwack. The integration has just started but already raises some issues: how can we concurrently run the same checkout test (add an item to cart, buy it using a gift card) across 200 devices – will we create 200 test accounts, one per device? We will post again on our status 6 months from now, with hopefully more lessons learned and success stories – stay tuned!

You can follow Nassim on Twitter at @kepioo 

6 responses

Reducing Domain Sharding

Posted by Jonathan Klein | Filed under performance

This post originally appeared on the Perf Planet Performance Calendar on December 7th, 2013.

Domain sharding has long been considered a best practice for pages with lots of images.  The number of domains that you should shard across depends on how many HTTP requests the page makes, how many connections the client makes to each domain, and the available bandwidth.  Since it can be challenging to change this dynamically (and can cause browser caching issues), people typically settle on a fixed number of shards – usually two.

An article published earlier this year by Chromium contributor William Chan outlined the risks of sharding across too many domains, and Etsy was called out as an example of a site that was doing this wrong.  To quote the article: “Etsy’s sharding causes so much congestion related spurious retransmissions that it dramatically impacts page load time.”  At Etsy we’re pretty open with our performance work, and we’re always happy to serve as an example.  That said, getting publicly shamed in this manner definitely motivated us to bump the priority of reinvestigating our sharding strategy.

Making The Change

The code changes to support fewer domains were fairly simple, since we have abstracted away the process that adds a hostname to an image path in our codebase.  Additionally, we had the foresight to exclude the hostname from the cache key at our CDNs, so there was no risk of a massive cache purge as we switched which domain our images were served on.  We were aware that this would expire the cache in browsers, since they do include hostname in their cache key, but this was not a blocker for us because of the improved end result.  To ensure that we ended up with the right final number, we created variants for two, three, and four domains.  We were able to rule out the option to remove domain sharding entirely through synthetic tests.  We activated the experiment in June using our A/B framework, and ran it for about a month.


After looking at all of the data, the variant that sharded across two domains was the clear winner.  Given how easy this change was to make, the results were impressive:

  • 50-80ms faster page load times for image heavy pages (e.g. search), 30-50ms faster overall.
  • Up to 500ms faster load times on mobile.
  • 0.27% increase in pages per visit.

As it turns out, William’s article was spot on – we were sharding across too many domains, and network congestion was hurting page load times.  The new CloudShark graph supported this conclusion as well, showing a peak throughput improvement of 33% and radically reduced spurious retransmissions:

Before – Four Shards


After – Two Shards


Lessons Learned

This story had a happy ending, even though in the beginning it was a little embarrassing.  We had a few takeaways from the experience:

  • The recommendation to shard across two domains still holds for us (YMMV depending on page composition).
  • Make sure that your CDN is configured to leave hostname out of the cache key.  This was integral to making this change painless.
  • Abstract away the code that adds a hostname to an image URI in your code.
  • Measure everything, and question assumptions about existing decisions.
  • Tie performance improvements to business metrics – we were able to tell a great story about the win we had with this change, and feel confident that we made the right call.
  • Segment your data across desktop and mobile, and ideally international if you can.  The dramatic impact on mobile was a huge improvement which would have been lost in an aggregate number.

Until SPDY/HTTP 2.0 comes along, domain sharding can still be a win for your site, as long as you test and optimize the number of domains to shard across for your content.

3 responses

December 2013 Site Performance Report

Posted by Jonathan Klein | Filed under performance

It’s a new year, and we want to kick things off by filling you in on site performance for Q4 2013. Over the last three months front-end performance has been pretty stable, and backend load time has increased slightly across the board.

Server Side Performance

Here are the median and 95th percentile load times for signed in users on our core pages on Wednesday, December 18th:

Server Side Performance December 2013

There was an across the board increase in both median and 95th percentile load times over the last three months, with a larger jump on our search results page. There are two main factors that contributed to this increase: higher traffic during the holiday season and an increase in international traffic, which is slower due to translations. On the search page specifically, browsing in US English is significantly faster than any other language. This isn’t a sustainable situation over the long term as our international traffic grows, so we will be devoting significant effort to improving this over the next quarter.

Synthetic Front-end Performance

As usual, we are using our private instance of WebPagetest to get synthetic measurements of front-end load time. We use a DSL connection and test with IE8, IE9, Firefox, and Chrome. The main difference with this report is that we have switched from measuring Document Complete to measuring Speed Index, since we believe that it provides a better representation of user perceived performance. To make sure that we are comparing with historical data, we pulled Speed Index data from October for the “old” numbers. Here is the data, and all of the numbers are medians over a 24 hour period:

Synthetic Front-End Performance December 2013

Start render didn’t really change at all, and speed index was up on some pages and down on others. Our search results page, which had the biggest increase on the backend, actually saw a 0.2 second decrease in speed index. Since this is a new metric we are tracking, we aren’t sure how stable it will be over time, but we believe that it provides a more accurate picture of what our visitors are really experiencing.

One of the downsides of our current wpt-script setup is that we don’t save waterfalls for old tests – we only save the raw numbers. Thus when we see something like a 0.5 second jump in Speed Index for the shop page, it can be difficult to figure out why that jump occurred. Luckily we are Catchpoint customers as well, so we can turn to that data to get granular information about what assets were on the page in October vs. December. The data there shows that all traditional metrics (render start, document complete, total bytes) have gone down over the same period. This suggests that the jump in speed index is due to loading order, or perhaps a change in what’s being shown above the fold. Our inability to reconcile these numbers illustrates a need to have visual diffs, or some other mechanism to track why speed index is changing. Saving the full WebPagetest results would accomplish this goal, but that would require rebuilding our EC2 infrastructure with more storage – something we may end up needing to do.

Overall we are happy with the switch to speed index for our synthetic front-end load time numbers, but it exposed a need for better tooling.

Real User Front-end Performance

These numbers come from mPulse, and are measured via JavaScript running in real users’ browsers:

Real User Front-end Performance December 2013

There aren’t any major changes here, just slight movement that is largely within rounding error. The one outlier is search, especially since our synthetic numbers showed that it got faster. This illustrates the difference between measuring onload, which mPulse does, and measuring speed index, which is currently only present in WebPagetest. This is one of the downsides of Real User Monitoring – since you want the overhead of measurement to be low, the data that you can capture is limited. RUM excels at measuring things like redirects, DNS lookup times, and time to first byte, but it doesn’t do a great job of providing a realistic picture of how long the full page took to render from the customer’s point of view.


We have a backend regression to investigate, and front-end tooling to improve, but overall there weren’t any huge surprises. Etsy’s performance is still pretty good relative to the industry as a whole, and relative to where we were a few years ago. The challenge going forward is going to center around providing a great experience on mobile devices and for international users, as the site grows and becomes more complex.

4 responses

Static Analysis with OCLint

Posted by Akiva Leffert | Filed under mobile

At Etsy, we’re big believers in making tools do our work for us.

On the mobile apps team we spend most of our time focused on building new features and thinking about how the features of Etsy fit into an increasingly mobile world. One of the great things about working at Etsy is that we have a designated period called Code Slush around the winter holidays where product development slows down and we can take stock of where we are and do things that we think are important or useful, but that don’t fit into our normal release cycles. Even though our apps team releases significantly less frequently than our web stack, making it easier to continue developing through the holiday season, we still find it valuable to take this time out at the end of the year.

This past slush we spent some of that time contributing to the OCLint project and integrating it into our development workflow. OCLint, as the name suggests, is a linter tool for Objective-C. It’s somewhat similar to the static analyzer that comes built into Xcode, and it’s built on the same clang infrastructure. OCLint is a community open source project and all of the changes to it we’re discussing have been contributed back and are available with the rest of OClint on their github page.

If you run OCLint on your code it will tell you things like, “This method is suspiciously long” or “The logic on this if statement looks funny”. In general, it’s great at identifying these sorts of code smells. We thought it would be really cool if we could extend it to find definite bugs and to statically enforce contracts in our code base. In the remainder of this post, we’re going to talk about what those checks are and how we take advantage of them, both in our code and in our development process.


Objective-C is a statically typed Object Oriented language. Its type system gets the job done, but it’s fairly primitive in certain ways. Often, additional contracts on a method are specified as comments. One thing that comes up sometimes is knowing what methods a subclass is required to implement. Typically this is indicated in a comment above the method.

For example, UIActivity.h contains the comment // override methods above a list of several of its methods.

This sort of contract is trivial to check at compile time, but it’s not part of the language, making these cases highly error prone. OCLint to the rescue! We added a check for methods that subclasses are required to implement. Furthermore, you can use the magic of Objective-C categories to mark up existing system libraries.

To mark declarations, oclint uses clang’s __attribute__((annotate(“”))) feature to pass information from your code to the checker.
To make these marks on a system method like the -activityType method in UIActivity, you would stick the following in a header somewhere:

@interface UIActivity (StaticChecks)
- (NSString *)activityType
__attribute__((annotate(“oclint:enforce[subclass must implement]”)));

That __attribute__ stuff is ugly and hard to remember so we #defined it away.

__attribute__((annotate(“oclint:enforce[subclass must implement]”)))

Now we can just do:

@interface UIActivity (StaticChecks)

We’ve contributed back a header file with these sorts of declarations culled from the documentation in UIKit that anyone using oclint can import into their project. We added this file into our project's .pch file so it’s included in every one one of our classes automatically.

Some other checks we’ve added:

Protected Methods

This is a common feature in OO languages - methods that only a subclass and its children can call. Once again, this is usually indicated in Objective-C by comments or sometimes by sticking the declarations in a category in separate header. Now we can just tack on OCLINT_PROTECTED_METHOD at the end of the declaration. This makes the intent clear, obvious, and statically checked.

Prohibited Calls

This is another great way to embed institutional knowledge directly into the codebase. You can mark methods as deprecated using clang, but this is an immediate compiler error. We’ll talk more about our workflow later, but doing it through oclint allow us to migrate from old to new methods gradually and easily use things while debugging that we wouldn’t want to commit.

We have categories on NSArray and NSDictionary that we use instead of the built in methods, as discussed here. Marking the original library methods as prohibited lets anyone coming into our code base know that they should be using our versions instead of the built in ones. We also have a marker on NSLog, so that people don’t accidentally check in debug logs. Frequently the replacement for the prohibited call calls the prohibited call itself, but with a bunch of checks and error handling logic. We use oclint’s error suppression mechanism to hide the violation that would be generated by making the original call. This is more syntactically convenient than dealing with clang pragmas like you would have to using the deprecated attribute.

Ivar Assignment Outside Getters

We prefer to use properties whenever possible as opposed to bare ivar accesses. Among other things, this is more syntactically and semantically regular and makes it much easier to set breakpoints on changes to a given property when debugging.  This rule will emit an error when it sees an ivar assigned outside its getter, setter, or the owning class’s init method.

-isEquals: without -hash

In Cocoa, if you override the -isEquals: method that checks for object equality, it’s important to also override the -hash method. Otherwise you’ll see weird behavior in collections when using the object as a dictionary key. This check finds classes that implement -isEquals: without implementing -hash. This is another great example of getting contracts out of comments and into static checks.


We think that oclint adds a lot of value to our development process, but there were a couple of barriers we had to deal with to make sure our team picked it up. First of all, any codebase written without oclint’s rules strictly enforced for all of development will have tons of minor violations. Sometimes the lower priority things it warns about are actually done deliberately to increase code clarity. To cut down on the noise we went through and disabled a lot of the rules, leaving only the ones we thought added significant value. Even with that, there were still a number of things it complained frequently about - things like not using Objective-C collection literals. We didn’t want to go through and change a huge amount of code all at once to get our violations down to zero, so we needed a way to see only the violations that were relevant to the current change. Thus, we wrote a little script to only run oclint on the changed files. This also allows us to easily mark something as no longer recommended without generating tons of noise, having to remove it entirely from our codebase, or fill up Xcode’s warnings and errors.

Finally, we wanted to make it super easy for our developers to start using it. We didn’t want to require them to run it manually before every commit. That would be just one more thing to forget and one more thing anyone joining our team would have to know about. Plus it’s kind of slow to run all of its checks on a large codebase. Instead, we worked together with our terrific testing and automation team to integrate it into our existing github pull request workflow. Now, whenever we make a pull request, it automatically kicks off a jenkins job that runs oclint on the changed files. When the job is done, it posts a summary a comment right to the pull request along with a link to the full report on jenkins. This ended up feeling very natural and similar to how we interact with the php code sniffer on our web stack.


We think oclint is a great way to add static checks to your Cocoa code. There are some interesting things going on with clang plugins and direct Xcode integration, but for now we’re going to stick with oclint. We like its base of existing rules, the ease of gradually applying its rules to our code base, and its reporting options and jenkins integration.

We also want to thank the maintainer and the other contributors for the hard work they’ve been put into the project. If you use these rules in interesting ways, or even boring ones, we’d love to hear about it. Interested in a working at a place that cares about the quality of its software and about solving its own problems instead of just letting them mount? Our team is hiring!

No responses

Android Staggered Grid

Posted by Deniz Veli | Filed under mobile

While building the new Etsy for Android app, a key goal for our team was delivering a great user experience regardless of device. From phones to tablets, phablets and even tabphones, we wanted all Android users to have a great shopping experience without compromise.

A common element throughout our mobile apps is the “Etsy listing card”. Listing cards are usually the first point of contact users have with our sellers’ unique items, whether they’re casually browsing through categories or searching for something specific. On these screens, when a listing card is shown, we think our users should see the images without cropping.

Android Lists and Grids 

A simple enough requirement but on Android things aren’t always that simple. We wanted these cards in a multi-column grid, with a column count that changes with device orientation while keeping grid position. We needed header and footer support, and scroll listeners for neat tricks like endless loading and quick return items. This wasn’t achievable using a regular old ListView or GridView.

Furthermore, both the Android ListView and GridView are not extendable in any meaningful way. A search for existing open libraries didn’t reveal any that met our requirements, including the unfinished StaggeredGridView available in the AOSP source.

Considering all of these things we committed to building an Android staggered grid view. The result is a UI component that is built on top of the existing Android AbsListView source for stability, but supports multiple columns of varying row sizes and more.

Android Trending - Devices
How it Works

The StaggeredGridView works much like the existing Android ListView or GridView. The example below shows how to add the view to your XML layout, specifying the margin between items and the number of columns in each orientation.

    app:column_count_landscape="3" />

You can of course set the layout margin and padding as you would any other view.

To show items in the grid create any regular old ListAdapter and assign it to the grid. Then there’s one last step. You need to ensure that the ListAdapter’s views maintain their aspect ratio. When column widths adjust on rotation, each item’s height should respond.

How do you do this? The AndroidStaggeredGrid includes a couple of utility classes including the DynamicHeightImageView which you can use in your adapter. This custom ImageView overrides onMeasure() and ensures the measured height is relative to the width based on the set ratio. Alternatively, you can implement any similar custom view or layout with the same measurement logic.

public void setHeightRatio(double ratio) {
    if (ratio != mHeightRatio) {
        mHeightRatio = ratio;

protected void onMeasure(int widthMeasureSpec, int heightMeasureSpec) {
    if (mHeightRatio > 0.0) {
        int width = MeasureSpec.getSize(widthMeasureSpec);
        int height = (int) (width * mHeightRatio);
        setMeasuredDimension(width, height);
    else {
        super.onMeasure(widthMeasureSpec, heightMeasureSpec);

And that’s it. The DynamicHeightImageView will maintain the aspect ratio of your items and the grid will take care of recycling views in the same manner as a ListView. You can check out the GitHub project for more details on how it’s used including a sample project.

But There’s More

Unlike the GridView, you can add header and footer views to the StaggeredGridView. You can also apply internal padding to the grid that doesn’t affect the header and footer views. An example view using these options is shown below. On our search results screen we use a full width header and add a little extra horizontal grid padding for 10 inch tablets.

Android Search

Into the Real World

During the development process we fine-tuned the grid’s performance using a variety of real world Android devices available in the Etsy Device Lab. When we released the new Etsy for Android app at the end of November, the AndroidStaggeredGrid was used throughout. Post launch we monitored and fixed some lingering bugs found with the aid of the awesome crash reporting tool Crashlytics.

We decided to open source the AndroidStaggeredGrid: a robust, well tested and real world UI component for the Android community to use. It’s available on GitHub or via Maven, and we are accepting pull requests.

Finally, a friendly reminder that the bright folks at Etsy mobile are hiring.

You can follow Deniz on Twitter at @denizmveli.

1 response

Migrating to Chef 11

Posted by Ryan Frantz | Filed under operations

Configuration management is critical to maintaining a stable infrastructure. It helps ensure systems and software are configured in a consistent and deterministic way. For configuration management we use Chef.  Keeping Chef up-to-date means we can take advantage of new features and improvements. Several months ago, we upgraded from Chef 10 to Chef 11 and we wanted to share our experiences.


We started by setting up a new Chef server running version 11.6.0.  This was used to validate our Chef backups and perform testing across our nodes.  The general plan was to upgrade the nodes to Chef 11, then point them at the new Chef 11 server when we were confident that we had addressed any issues.  The first order of business: testing backups.  We’ve written our own backup and restore scripts and we wanted to be sure they’d still work under Chef 11.  Also, these scripts would come in handy to help us quickly iterate during break/fix cycles and keep the Chef 10 and Chef 11 servers in sync.  Given that we can have up to 70 Chef developers hacking on cookbooks, staying in sync during testing was crucial to avoiding time lost to troubleshooting issues related to cookbook drift.

Once the backup and restore scripts were validated, we reviewed the known breaking changes present in Chef 11.  We didn’t need much in the way of fixes other than a few attribute precedence issues and updating our knife-lastrun handler to use run_context.loaded_recipes instead of node.run_state().

Unforeseen Breaking Changes

After addressing the known breaking changes, we moved on to testing classes of nodes one at a time.  For example, we upgraded a single API node to Chef 11, validated Chef ran cleanly against the Chef 10 server, then proceeded to upgrade the entire API cluster and monitor it before moving on to another cluster.  In the case of the API cluster, we found an unknown breaking change that prevented those nodes from forwarding their logs to our log aggregation hosts.  This episode initially presented a bit of a boondoggle and warrants a little attention as it may help others during their upgrade.

The recipe we use to configure syslog-ng sets several node attributes, for various bits and bobs.  The following line in our cookbook is where all the fun started:

if !node.default[:syslog][:items].empty?

That statement evaluated to false on the API nodes running Chef 11 and resulted in a vanilla syslog-ng.conf file that didn’t direct the service to forward any logs.  Thinking that we could reference those nested attributes via the :default symbol, we updated the cookbook.  The Chef 11 nodes were content but all of the Chef 10 nodes were failing to converge because of the change.  It turns out that accessing default attributes via the node.default() method and node[:default] symbol are not equivalent.  To work around this, we updated the recipe to check for Chef 11 or Chef 10 behavior and assign our variables accordingly.  See below for an example illustrating this:

if node[:syslog].respond_to?(:has_key?)
    # Chef 11
    group = node[:syslog][:group] || raise("Missing group!")
    items = node[:syslog][:items]
    # Chef 10
    group = node.default[:syslog][:group] || raise("Missing group!")
    items = node.default[:syslog][:items]

In Chef 11, the :syslog symbol points to the key in the attribute namespace (it’s an ImmutableHash object) we need and responds to the .has_key?() method; in that case, we pull in the needed attributes Chef 11-style.  If the client is Chef 10, that test fails and we pull in the attributes using the .default() method.


Once we had upgraded all of our nodes and addressed any issues, it was time to migrate to the Chef 11 server.  To be certain that we could recreate the build and that our Chef 11 server cookbooks were in good shape, we rebuilt the Chef 11 server before proceeding.  Since we use a CNAME record to refer to our Chef server in the nodes’ client.rb config file, we thought that we could simply update our internal DNS systems and break for an early beer.  To be certain, however, we ran a few tests by pointing a node at the FQDN of the new Chef server.  It failed its Chef run.

Chef 10, by default, communicates to the server via HTTP; Chef 11 uses HTTPS.  In general, Chef 11 Server redirects the Chef 11 clients attempting to use HTTP to HTTPS.  However, this breaks down when the client requests cookbook versions from the server.  The client receives an HTTP 405 response.  The reason for this is that the client sends a POST to the following API endpoint to determine which versions of the cookbooks from its run_list need to be downloaded:


If Chef is communicating via HTTP, the POST request is redirected to use HTTPS.  No big deal, right?  Well, RFC 2616 is pretty clear that when a request is redirected, “[t]he action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET…”  When the Chef 11 client attempts to hit the /environments/cookbook_versions endpoint via GET, Chef 11 Server will respond with an HTTP 405 as it only allows POST requests to that resource.

The fix was to update all of our nodes’ client configuration files to use HTTPS to communicate with the Chef Server.  dsh (distributed shell) made this step easy.

Just before we finalized the configuration update, we put a freeze on all Chef development and used our backup and restore scripts to populate the new Chef 11 server with all the Chef objects (nodes, clients, cookbooks, data bags, etc) from the Chef 10 server.  After validating the restore operation, we completed the client configuration updates and shut down all Chef-related services on the Chef 10 server.  Our nodes happily picked up where they’d left off and continued to converge on subsequent Chef runs.


Following the migration, we found two issues with chef-client that required deep dives to understand, and correct, what was happening.  First, we had a few nodes whose chef-client processes were exhausting all available memory.  Initially, we switched to running chef-client in forking mode.  Doing so mitigated this issue to an extent (as the forked child released its allocated memory when it completed and was reaped) but we were still seeing an unusual memory utilization pattern.  Those nodes were running a recipe that included nested searches for nodes.  Instead of returning the node names and searching on those, we were returning whole node objects.  For a long-running chef-client process, this continued to consume available memory.  Once we corrected that issue, memory utilization fell down to acceptable levels.

See the following screenshot illustrating the memory consumption for one of these nodes immediately following the migration and after we updated the recipe to return references to the objects instead:


Here’s an example of the code in the recipe that created our memory monster:

# find nodes by role, the naughty, memory hungry way
roles = search(:role, '*:*')    # NAUGHTY
roles.each do |r|
  nodes_dev = search(:node, "role:#{r.name} AND fqdn:*dev.*")    # HUNGRY
  template "/etc/xanadu/#{r.name.downcase}.cfg" do
    :nodes => nodes_dev

Here’s the same code example, returning object references instead:

# find nodes by role, the community-friendly, energy-conscious way
search(:role, '*:*') do |r|
  fqdns = []
  search(:node, "role:#{r.name} AND fqdn:*dev.*") do |n|
    fqdns << n.fqdn
  template "/etc/xanadu/#{r.name.downcase}.cfg" do
      :nodes => fqdns

Second, we found an issue where, in cases where chef-client would fail to connect to the server, it would leave behind its PID file, preventing future instances of chef-client from starting.  This has been fixed in version 11.6.0 of chef-client.


Despite running into a few issues following the upgrade, thorough testing and Opscode’s documented breaking changes helped make our migration fairly smooth. Further, the improvements made in Chef 11 have helped us improve our cookbooks. Finally, because our configuration management system is updated, we can confidently focus our attention on other issues.

8 responses

September 2013 Site Performance Report

Posted by Jonathan Klein | Filed under performance

As we enter the fourth quarter of 2013, it’s time for another site performance report about how we did in Q3. Our last report highlighted the big performance boost we saw from upgrading to PHP 5.4, and this report will examine a general front-end slowdown that we saw over the last few months.

Server Side Performance

Here are the median and 95th percentile load times for signed in users on our core pages on Wednesday, September 18th:

On the server side we saw a modest decrease on most pages, with some pages (e.g. the profile page) seeing a slight increase in load time. As we have mentioned in past reports, we are not overly worried about the performance of our core pages, so the main thing we are looking for here is to avoid a regression. We managed to achieve this goal, and bought ourselves a little extra time on a few pages through some minor code changes. This section isn’t very exciting, but in this case no news is good news.

Synthetic Front-end Performance

The news here is a mixed bag. As usual, we are using our private instance of WebPagetest to get synthetic measurements of front-end load time. We use a DSL connection and test with IE8, IE9, Firefox, and Chrome. Here is the data, and all of the numbers are medians over a 24 hour period:

On the plus side, we saw a pretty significant decrease in start render time almost across the board, but document complete time increased everywhere, and increased dramatically on the listing page. Both of these global effects can be explained by rolling out deferred JavaScript everywhere – something we mentioned back in our June report. At that time we had only done it on the shop page, and since then we have put it on all pages by default. This explains the decrease in start render time on all pages except for the shop page. The profile page also had a slight uptick in start render time, and we are planning on investigating that.

One of the side effects of deferring JavaScript is that document complete time tends to rise, especially in IE8. This is an acceptable tradeoff for us, since we care more about optimizing start render, and getting content in front of the user as quickly as possible. We’re also not convinced that a rise in document complete time will have a negative impact on business metrics, and we are running tests to figure that out now.

The massive increase in document complete time on the listing page is due to the rollout of a page redesign, which is much heavier and includes a large number of web fonts. We are currently setting up a test to measure the impact of web fonts on customer engagement, and looking for ways to reduce page weight on the listing page. While document complete isn’t a perfect metric, 8 seconds is extremely high, so this bears looking into. That said, we A/B tested engagement on the old page and the new, and all of the business metrics we monitor are dramatically better with the new version of the listing page. This puts further doubt on the impact of document complete on customer behavior, and illustrates that performance is not the only thing influencing engagement – design and usability obviously play a big role.

Real User Front-end Performance

These numbers come from mPulse, and are measured via JavaScript running in real users’ browsers:

The effect here mirrors what we saw on the synthetic side – a general upward trend, with a larger spike on the listing page. These numbers are for the “page load” event in mPulse, which is effectively the onload event. As Steve Souders and others have pointed out, onload is not a great metric, so we are looking for better numbers to measure on the real user side of things. Unfortunately there isn’t a clear replacement at this point, so we are stuck with onload for now.


Things continue to look good on the server side, but we are slipping on the front-end. Partly this has to do with imperfect measurement tools, and partly it has to do with an upward trend in page weight that is occurring all across the web – and Etsy is no exception. Retina images, web fonts, responsive CSS, new JavaScript libraries, and every other byte of content that we serve continue to provide challenges for the performance team. As we continue to get hard data on how much page weight impacts performance (at least on mobile), we can make a more compelling case for justifying every byte that we serve.

You can follow Jonathan on Twitter at @jonathanklein

2 responses

Nagios, Sleep Data, and You

Posted by Ryan Frantz | Filed under monitoring, operations

Gettin’ Shuteye

Ian Malpass once commented that “[i]f Engineering at Etsy has a religion, it’s the Church of Graphs.”  And I believe!  Before I lay me down to sleep during an on-call shift, I say a little prayer that should something break, there’s a graph somewhere I can reference.  Lately, a few of us in Operations have begun tracking our sleep data via Jawbone UPs.  After a few months of this we got to wondering how this information could be useful, in the context of Operations.  Sleep is important.  And being on call can lead to interrupted sleep.  Even worse, after being woken up, the amount of time it takes to return to sleep varies by person and situation.  So, we thought, “why not graph the effect of being on call against our sleep data?”

Gathering and Visualizing Data

We already visualize code deploys against the myriad graphs we generate, to lend context to whatever we’re measuring.  We use Nagios to alert us to system and service issues.  Since Nagios writes consistent entries to a log file, it was a simple matter to write a Logster parser to ship metrics to Graphite when a host or service event pages out to an operations engineer.  Those data points can then be displayed as “deploy lines” against our sleep data.

For the sleep data we used, and extended, Aaron Parecki’s ‘jawbone-up‘ gem to gather sleep data (summary and detail information) via Jon Cowie’s handy ‘jawboneup_to_graphite‘ script on a daily basis.  Those data are then displayed on personal dashboards (using Etsy’s Dashboard project).


So far, we’ve only just begun to collect and display this information.  As we learn more, we’ll be certain to share our findings.  In the meantime, here are examples from recent on-call shifts.


This engineer appeared to get some sleep!


Here, the engineer was alerted to a service in the critical state in the wee hours of the morning. From this graph we can tell that he was able to address the issue fairly quickly, and most importantly, get back to sleep fast.

NOTE:  Jawbone recently opened up their API.  Join the party and help build awesome apps and tooling around this device!

14 responses

older posts »

Recent Posts


Etsy At Etsy, our mission is to enable people to make a living making things, and to reconnect makers with buyers. The engineers who make Etsy make our living with a craft we love: software. This is where we'll write about our craft and our collective experience building and running the world's most vibrant handmade marketplace.

Code as Craft is proudly powered by WordPress.com VIP and the SubtleFlux theme.

© Copyright 2014 Etsy