Q1 2015 Site Performance Report
Spring has finally arrived, which means it’s time to share our Site Performance Report for the first quarter of 2015. Like last quarter, in this report, we’ve taken data from across an entire week in March and are comparing it with data from an entire week in December. Since we are constantly trying to improve our data reporting, we will be shaking things up with our methodology for the Q2 report. For backend performance, we plan to randomly sample the data throughout the quarter so it is more statistically sound and a more accurate representation of the full period of time.
We’ve split up the sections of this report among the Performance team and different members will be authoring each section. Allison McKnight is updating us on the server-side performance section, Natalya Hoota is covering the real user monitoring portion, and as a performance bootcamper, I will have the honor of reporting the synthetic front-end monitoring. Over the last three months, front-end and backend performance have remained relatively stable with some variations to specific pages such as, baseline, home, and profile. Now without further ado, let’s dive into the numbers.
Server-Side Performance – from Allison McKnight
Let’s take a look at the server-side performance for the quarter. These are times seen by real users (signed-in and signed-out). The baseline page includes code that is used by all of our pages but has no additional content.
Check that out! None of our pages got significantly slower (slower by at least 10% of their values from last quarter). We do see a 50 ms speedup in the homepage median and a 30 ms speedup in the baseline 95th percentile. Let’s take a look at what happened.
First, we see about a 25 ms speedup in the 95th percentile of the baseline backend time. The baseline page has three components: loading bootstrap/web.php, a bootstrap file for all web requests; creating a controller to render the page; and rendering the baseline page from the controller. We use StatsD, a tool that aggregates data and records it in Graphite, to graph each step we take to load the baseline page. Since we have a timer for each step, I was able to drill down to see that the upper bound for the bootstrap/web step dropped significantly in the end of January:
We haven’t been able to pin down the change that caused this speedup. The bootstrap file performs a number of tasks to set up infrastructure – for example, setting up logging and security checks – and it seems likely that an optimization to one of these processes resulted in a faster bootstrap.
We also see a 50 ms drop in the homepage median backend time. This improvement is from rolling out HHVM for our internal API traffic.
HHVM is a virtual machine developed at Facebook to run PHP and Hack, a PHP-like language developed by Facebook and designed to give PHP programmers access to language features that are unavailable in PHP. HHVM uses just-in-time (JIT) compilation to compile PHP and Hack into bytecode during runtime, allowing for optimizations such as code caching. Both HHVM and Hack are open-source.
This quarter we started sending all of our internal API v3 requests to six servers running HHVM and saw some pretty sweet performance wins. Overall CPU usage in our API cluster dropped by about 20% as the majority of our API v3 traffic was directed to the HHVM boxes; we expect we’ll see an even larger speedup when we move the rest of our API traffic to HHVM.
Time spent in API endpoints dropped. Most notably, we saw a speedup in the search listings endpoint (200 ms faster on the median and 100 ms faster on the 90th percentile) and the fetch listing endpoint (100 ms faster on the median and 50 ms faster on the 90th percentile).
Since these endpoints are used mainly in our native apps, mobile users will have seen a speed boost when searching and viewing listings. Desktop users also saw some benefits: namely, the median homepage backend time for signed-in users, whose homepages we personalize with listings that they might like, dropped by 95 ms. This is what caused the 50 ms drop in the median backend time for all homepage views this quarter.
The transition to using HHVM for our internal API requests was headed by Dan Miller on our Core Platform team. At Etsy, we like to celebrate the work done on different teams to improve Performance by naming a Performance Hero when exciting improvements are made. Dan was named the first Performance Hero of 2015 for his work on HHVM. Go, Dan!
To learn more about how we use HHVM at Etsy and the benefits that it’s brought us, you can see the slides from his talk HHVM at Etsy, which he gave at PHP UK 2015 Conference. A Code as Craft post about HHVM at Etsy will appear from him in the future, so keep checking back!
Synthetic Front-End Performance – from Kristyn Reith
Below is the synthetic front-end performance data for Q1. For synthetic testing, a third party simulates actions taken by a user and then continuously monitors these actions to generate performance metrics. For this report, the data was collected by Catchpoint, which runs tests every ten minutes on IE9 in New York, London, Chicago, Seattle and Miami. Catchpoint defines the webpage response metric as the time it takes from the request being issued until the last byte of the final element of the page is received. These numbers are all medians and here is the data for the week of March 8-15th 2015 compared to the week of December 15-22nd 2014.
To calculate error ranges for our median values, we use Catchpoint’s standard deviation. Based on these standard deviations, the only statistically significant performance regression we saw was for the homepage for both the start render and webpage response times. Looking further into this, we dug into the homepage’s waterfall charts and discovered that Catchpoint’s “response time” metrics are including page elements that load asynchronously. The webpage response time should not account for elements loaded after the document is considered complete. Therefore, this regression is actually no more than a measurement tooling problem and not representative of a real slowdown.
Based on these standard deviations, we saw several improvements. The most noteworthy of these are the start render and webpage response times for the listing page. After investigating potential causes for this performance win, we discovered that this was no more than an error in data collection on our end. The Etsy shop that owns the listing page that we use to collect data in Catchpoint had been put on vacation mode, which temporarily puts the shop “on hold” and hides listings, prior to us pulling the Q1 data. While on vacation mode, the listing for the listing page in question expired on March 7th. So all the data pulled for the week we measured in March does not represent the same version of the listing page that was measured in our previous report, since the expired listing page includes additional suggested items. To avoid having an error like this occur in the future, the performance team will be creating a new shop with a collection of listings, specifically designated for performance testing.
Although the synthetic data for this quarter may seem to suggest that there were major changes, it turned out that the biggest of these were merely errors in our data collection. As we note in the conclusion, we’re going to be overhauling a number of ways we gather data for these reports.
Real User Front-End Performance – from Natalya Hoota
As in our past reports, we are using real user monitoring (RUM) data from mPulse. Real user data, as opposed to synthetic measurements, is sent from users’ browsers in real time.
It does look like the overall trend is global increase in page load time. After a few examinations it appears that most of the slowdown is coming from the front end. A few things to note here – the difference is not significant (less than 10%) with an exception for homepage and profile page.
Homepage load time was affected slightly more than the rest due to two experiments with real time recommendations and page content grouping, both of which are currently ramped down. Profile page showed no outstanding increase in time for the median values; as for the long tail (95 percentile), however, there was a greater change for the worse.
Another interesting nugget that we found was that devices send a different set of metrics to mPulse based on whether their browsers support navigation timing. The navigation timing API was proposed by W3C on 2012, leading to major browsers gradually rolling in support for them. Notably, Apple added it to Safari last July, allowing RUM vendors better insight into users experience. For our data analysis it means the following: we should examine each navigation and resource timing metrics separately, since the underlying data sets are not identical.
In order to make a definitive conclusion, we would need to test statistical validity of that data. In the next quarter we are hoping to incorporate changes that will include better precision in our data collection, analysis and visualization.
Conclusion – from Kristyn Reith
The first quarter of 2015 has included some exciting infrastructure changes. We’ve already begun to see the benefits that have resulted from the introduction of HHVM and we are looking forward to seeing how this continues to impact performance as we transition the rest of our API traffic over.
Keeping with the spirit of exciting changes, and acknowledging the data collection issues we’ve discovered, we will be rolling out a whole new approach to this report next quarter. We will partner with our data engineering team to revamp the way we collect our backend data for better statistical analysis. We will also experiment with different methods of evaluation and visualization to better-represent the speed findings in the data. We’ve also submitted a feature request to Catchpoint to add an alert that’s only triggered if bytes *before* document complete have regressed. With these changes, we look forward to bringing you a more accurate representation of the data across the quarter, so please check back with us in Q2.