Etsy Icon>

Code as Craft

Reducing Domain Sharding main image

Reducing Domain Sharding

  image

This post originally appeared on the Perf Planet Performance Calendar on December 7th, 2013.
Domain sharding has long been considered a best practice for pages with lots of images.  The number of domains that you should shard across depends on how many HTTP requests the page makes, how many connections the client makes to each domain, and the available bandwidth.  Since it can be challenging to change this dynamically (and can cause browser caching issues), people typically settle on a fixed number of shards - usually two.
An article published earlier this year by Chromium contributor William Chan outlined the risks of sharding across too many domains, and Etsy was called out as an example of a site that was doing this wrong.  To quote the article: “Etsy’s sharding causes so much congestion related spurious retransmissions that it dramatically impacts page load time.”  At Etsy we're pretty open with our performance work, and we’re always happy to serve as an example.  That said, getting publicly shamed in this manner definitely motivated us to bump the priority of reinvestigating our sharding strategy.

Making The Change

The code changes to support fewer domains were fairly simple, since we have abstracted away the process that adds a hostname to an image path in our codebase.  Additionally, we had the foresight to exclude the hostname from the cache key at our CDNs, so there was no risk of a massive cache purge as we switched which domain our images were served on.  We were aware that this would expire the cache in browsers, since they do include hostname in their cache key, but this was not a blocker for us because of the improved end result.  To ensure that we ended up with the right final number, we created variants for two, three, and four domains.  We were able to rule out the option to remove domain sharding entirely through synthetic tests.  We activated the experiment in June using our A/B framework, and ran it for about a month.

Results

After looking at all of the data, the variant that sharded across two domains was the clear winner.  Given how easy this change was to make, the results were impressive:- 50-80ms faster page load times for image heavy pages (e.g. search), 30-50ms faster overall.

  • Up to 500ms faster load times on mobile.
  • 0.27% increase in pages per visit.

As it turns out, William’s article was spot on - we were sharding across too many domains, and network congestion was hurting page load times.  The new CloudShark graph supported this conclusion as well, showing a peak throughput improvement of 33% and radically reduced spurious retransmissions:

Before - Four Shards

before

After - Two Shards

after

Lessons Learned

This story had a happy ending, even though in the beginning it was a little embarrassing.  We had a few takeaways from the experience:- The recommendation to shard across two domains still holds for us (YMMV depending on page composition).

  • Make sure that your CDN is configured to leave hostname out of the cache key.  This was integral to making this change painless.
  • Abstract away the code that adds a hostname to an image URI in your code.
  • Measure everything, and question assumptions about existing decisions.
  • Tie performance improvements to business metrics - we were able to tell a great story about the win we had with this change, and feel confident that we made the right call.
  • Segment your data across desktop and mobile, and ideally international if you can.  The dramatic impact on mobile was a huge improvement which would have been lost in an aggregate number.

Until SPDY/HTTP 2.0 comes along, domain sharding can still be a win for your site, as long as you test and optimize the number of domains to shard across for your content.