Q3 2015 Site Performance Report
Sadly, the summer has come to an end here in Brooklyn, but the changing of the leaves signifies one thing—it’s time to release our Q3 site performance report! For this report, we’ve collected data from a full week in September that we will be comparing to a full week of data from May. Similar to last quarter’s report, we will be using box plots to better visualize the data and the changes we’ve seen.
While we love to share stories of our wins, we find it equally important to report on the challenges we face. The prevailing pattern you will notice across all sections of this report is increased latency. Kristyn Reith will provide an update on backend server-side performance and Mike Adler, one of the newest members to the Performance team, will be reporting the synthetic frontend and the real user monitoring sections of this report.
The server-side data below reflects the time seen by real users, both signed-in and signed-out. As a reminder, we are randomly sampling our data for all pages during the specified weeks in each quarter.
You can see that with the exception of the homepage, all of our pages have gotten slower on the backend. The performance team kicked off this quarter by hosting a post mortem for a site-wide performance degradation that occurred at the end of Q2. At that time, we had migrated a portion of our web servers to new, faster hardware, however the way the workload was initially distributed was overworking the old hardware, leading to poor performance for the 95th percentile. Increasing the weighting of the new hardware in the loadbalancer helped mitigate this. While medians did not see a significant impact over the course of the hardware change, it caused higher highs and lower lows for the 95th percentile. As a heavier page, the signed-in homepage saw the greatest improvement once the weights were adjusted, which contributed to its overall improvement this quarter. Other significant causes for the changes seen on the server side can be attributed to two new initiatives that were launched this quarter, Project Arizona and Category Navigation.
Arizona is a read-only key / value system to serve product recommendations and other generated datasets on a massive scale. It replaces a previous system that we had outgrown that stored all data in-memory; Arizona instead uses SSDs to allow for more and varied datasets. This quarter we launched the first phase of the project that resulted in some expected performance regressions compared with the previous memory-backed system. The first phase focused on correctness, ensuring data remained consistent between the two systems. Future phases will focus on optimizing speed of lookups to be comparable to the previous system while offering much greater scalability and availability.
In the beginning of August, our checkout team noticed two separate regressions on the cart page that had occurred over the course of the prior month. We had not been alerted on these slowdowns because at the end of Q2, the checkout team had launched cart pagination which improved the performance of the cart page by limiting the number of items loaded and we had not adjusted the thresholds to match this new normal. Luckily, the checkout team noticed the change in performance and we were able to trace the cause back to testing for Arizona.
While in the midst of testing for Arizona, we also launched a new site navigation bar that is included under the search bar on every page and features eight of the main shopping categories. Not only does the navigation bar make it easier for shoppers to find items on the site, but we also believe that the new navigation will positively affect Search Engine Optimization, driving more traffic to shops. While testing the feature we noticed some performance impacts so when the feature launched at the end of August, we were closely watching as we expected a performance degradation due to the amount of the HTML being generated. The performance impact was felt across the majority of our pages though it was more noticeable on some pages than others depending on the weight of the page. For example, lighter pages such as baseline appear harder hit because the navigation bar accounts for a significant amount of the page’s overall weight.
In an awesome win, in response to the anticipated performance hit, the buyer experience engineering team ramped up client side rendering for this new feature, which cut down the rendering time on buyer side pages by caching the HTML output and shipping less to the client.
In addition to the hardware change, Project Arizona and the new site navigation feature, we also have been investigating a slow, gradual regression we noticed across several pages that began in the first half of Q3. Extensive investigation and testing revealed that the regression was the result of limited CPU resources. We are currently adding additional CPU capacity and anticipate the affected pages will get faster in this current quarter.
Synthetic Start Render
Let’s move on to our synthetic tests where we have instrumented browsers load pages automatically every 10 minutes from several locations. This expands the scope of analysis to include browser-side measurements along with server-side. The strength of synthetic measurements is that we can get consistent, highly-detailed metrics about typical browser scenarios. We can look at “start render” to estimate when most people first see our pages loading.
The predominant observation is that our median render-start times across most pages has increased about 300ms compared to last quarter. You might expect a performance team to feel bummed out about a distinctly slower result, but we actually care about more about the overall user experience than just page speed measurements on any given week. The goal of our Performance team is not just to make a fast site, but to encourage discussions that accurately consider performance as one important concern among several.
This particular slowdown was caused by broader use of our new css toolkit, which adds 35k of CSS to every page. We expect the toolkit to be a net-win eventually, but we have to pay a temporary penalty while we work on eliminating non-standard styles. Several teams gathered together to discuss the impact of this change, which gave us confidence that Etsy’s culture of performance is continuing to mature, despite this particular batch of measurements.
The median render-start time for our search page appears to have increased by 800ms, following a similar degradation in the last quarter, but we found this to be misleading. We isolated this problem to IE browsers versions 10 and older, which actually represents a tiny fraction of Etsy users. The search page renders much faster (around 1100ms) in Chrome (far more popular), which is consistent with all our other pages across IE and Chrome.
Synthetic checks are vulnerable to this type of misleading measurement because it’s really difficult to build comprehensive labs that match the true diversity of browsers in the wild. RUM measurements are better suited to that task. We are currently discussing how to improve the browsers we use in our synthetic tests.
What was once a convenient metric for estimating experience may eventually become less meaningful as one fundamentally changes the way a site is loaded. We feel it is important to adapt our monitoring to the new realities of our product. We always want to be aligned with our product teams, helping them build the best experience, rather than spending precious time optimizing for metrics that were more useful in the past.
As it happens, we recently made a few product improvements around site navigation (mentioned in the above section). As we optimized the new version, we focused on end-user experience and it became clear that ‘Webpage Response’ was becoming less and less connected to end-user experience. WR includes the time for ALL assets loaded on the page, even if these requests are hidden from the end-user, such as deferred beacons.
We are evaluating alternative ways to estimate end-user experience in the future.
Real User Page Load Time
Real user monitoring give us insight into actual page loads experienced by end-users. Notably, it accounts for real-world diversity of network conditions, browser versions, and internationalization.
We can see across-the-board increases, which is in line with our other types of measurements. By looking at the daily summaries of these numbers, we confirmed that the RUM metrics regressed when we launched our revamped site navigation (first mentioned in the server-side section). Engineers at Etsy worked to optimize this feature over the next couple weeks and made progress, though one optimization ended up causing a regression on some browsers. This was not exposed except in our RUM data. We have a plan to speed this up during the fourth quarter.
In the third quarter, we had our ups and downs with site performance, due to both product and infrastructure changes. It is important to remember that performance cannot be reduced merely to page speed; it is a balancing act of many factors. Performance is a piece of the overall user experience and we are constantly improving our ability to evaluate performance and make wiser trade-offs to build the best experience. The slowdowns we saw this quarter have only reinforced our commitment to helping our engineering teams monitor and understand the impact of the new features and infrastructure changes they implement. We have several great optimizations and tools in the pipeline and we look forward to sharing the impact of these in the next report.