Etsy Icon>

Code as Craft

Experimenting with HHVM at Etsy main image
Experimenting with HHVM at Etsy

Experimenting with HHVM at Etsy

  image

In 2014 Etsy’s infrastructure group took on a big challenge: scale Etsy’s API traffic capacity 20X. We launched many efforts simultaneously to meet the challenge, including a migration to HHVM after it showed a promising increase in throughput. Getting our code to run on HHVM was relatively easy, but we encountered many surprises as we gained confidence in the new architecture.

What is HHVM?

Etsy Engineering loves performance, so when Facebook announced the availability of the HipHop Virtual Machine for PHP, its reported leap in performance over current PHP implementations got us really excited.

HipHop Virtual Machine (HHVM) is an open-source virtual machine designed for executing programs written in PHP. HHVM uses a just-in-time (JIT) compilation approach to achieve superior performance while maintaining the development flexibility that PHP provides.

This post focuses on why we became interested in HHVM, how we gained confidence in it as a platform, the problems we encountered and the additional tools that HHVM provides. For more details on HHVM, including information on the JIT compiler, watch Sara Golemon and Paul Tarjan’s presentation from OSCON 2014.

Why HHVM?

In 2014 engineers at Etsy noticed two major problems with how we were building mobile products. First, we found ourselves having to rewrite logic that was designed for being executed in a web context to be executed in an API context. This led to feature drift between the mobile and web platforms as the amount of shared code decreased.

The second problem was how tempting it became for engineers to build lots of general API endpoints that could be called from many different mobile views. If you use too many of these endpoints to generate a single view on mobile you end up degrading that view’s performance. Ilya Grigorik’s “Breaking the 1000ms Time to Glass Mobile Barrier” presentation explains the pitfalls of this approach for mobile devices. To improve performance on mobile, we decided to create API endpoints that were custom to their view. Making one large API request is much more efficient than making many smaller requests. This efficiency cost us some reusability, though. Endpoints designed for Android listing views may not have all the data needed for a new design in iOS. The two platforms necessitate different designs in order to create a product that feels native to the platform. We needed to reconcile performance and reusability.

To do this, we developed “bespoke endpoints”. Bespoke endpoints aggregate smaller, reusable, cacheable REST endpoints. One request from the client triggers many requests on the server side for the reusable components. Each bespoke endpoint is specific to a view.

Consider this example listing view. The client makes one single request to a bespoke endpoint. That bespoke endpoint then makes many requests on behalf of the client. It aggregates the smaller REST endpoints and returns all of the data in one response to the client.

Bespoke Endpoint

Bespoke endpoints don’t just fetch data on behalf of the client, they can also do it concurrently. In the example above, the bespoke endpoint for the web view of a listing will fetch the listing, its overview, and the related listings simultaneously. It can do this thanks to curl_multi. Matt Graham’s talk “Concurrent PHP in the Etsy API” from phpDay 2014 goes into more detail on how we use curl_multi. In a future post we’ll share more details about bespoke endpoints and how they’ve changed both our native app and web development.

This method of building views became popular internally. Unfortunately, it also came with some drawbacks.

API traffic growth compared to Web traffic growth

Now that web pages had the potential to hit dozens of API endpoints, traffic on our API cluster grew more quickly than we anticipated. But that wasn’t the only problem.

Bootstrap Time Visualized

This graph represents all the concurrent requests that take place when loading the Etsy homepage. Between the red bars is work that is duplicated across all of the fanned out requests. This duplicate work is necessary because of the shared-nothing process architecture of PHP. For every request, we need to build the world: fetch the signed-in user, their settings, sanitize globals and so on. Although much of this duplicated work is carried out in parallel, the fan-out model still causes unnecessary work for our API cluster. But it does improve the observed response time for the user.

After considering many potential solutions to this problem, we concluded that trying to share state between processes in a shared-nothing architecture would inevitably end in tears. Instead, we decided to try speeding up all of our requests significantly, including the duplicated bootstrap work. HHVM seemed well-suited to the task. If this worked, we&39;d increase throughput on our API cluster and be able to scale much more efficiently.

Following months of iterations, improvements and bug fixes, HHVM now serves all of the fan-out requests for our bespoke endpoints. We used a variety of experiments to gain confidence in HHVM and to discover any bugs prior to deploying it in production.

The Experiments

Minimum Viable Product

The first experiment was simple: how many lines of PHP code do we have to comment out before HHVM will execute an Etsy API endpoint? The results surprised us. We only encountered one language incompatibility. All of the other problems we ran into were with HHVM extensions. There were several incompatibilities with the HHVM memcached extension, all of which we have since submitted pull requests for.

Does it solve our problem?

We then installed both PHP 5.4 and HHVM on a physical server and ran a synthetic benchmark. This benchmark randomly splayed requests across three API endpoints that were verified to work in HHVM, beginning at a rate of 10 requests per second and ramping up to 280 requests per second. The throughput results were promising.

The little green line at the bottom is HHVM response time The little green line at the bottom is HHVM response time

Our PHP 5.4 configuration began to experience degraded performance at about 190 requests per second, while the same didn’t happen to HHVM until about 270 requests per second. This validated our assumption that HHVM could lead to higher throughput which would go a long way towards alleviating the load we had placed on our API cluster.

Gaining Confidence

So far we had validated that HHVM could run the Etsy API (at least with a certain amount of work) and that doing so would likely lead to increase in throughput. Now we had to become confident that HHVM could run etsy.com correctly. We wanted to verify that responses returned from HHVM were identical to those returned by PHP. In addition our API’s full automated test suite and good old-fashioned manual testing we also turned to another technique: teeing traffic.

You can think of “tee” in this sense like tee on the command line. We wrote an iRule on our f5 load balancer to clone HTTP traffic destined for one pool and send it to another. This allowed us to take production traffic that was being sent to our API cluster and also send it onto our experimental HHVM cluster, as well as an isolated PHP cluster for comparison.

This proved to be a powerful tool. It allowed us to compare performance between two different configurations on the exact same traffic profile.

Note that this is on powerful hardware. 140 rps peak. Note that this is on powerful hardware.

On the same traffic profile HHVM required about half as much CPU as PHP did. While this wasn’t the reduction seen by the HHVM team, who claimed a third as much CPU should be expected, we were happy with it. Different applications will perform differently on HHVM. We suspect the reason we didn’t see a bigger win is that our internal API was designed to be as lightweight as possible. Internal API endpoints are primarily responsible for fetching data, and as a result tend to be more IO bound than others. HHVM optimizes CPU time, not IO time.

While teeing boosted our confidence in HHVM there were a couple hacks we had to put in place to get it to work. We didn’t want teed HTTP requests generating writes in our backend services. To that end we wrote read-only mysql, memcached and redis interfaces to prevent writes. As a result however, we weren’t yet confident that HHVM would write data correctly, or write the correct data.

Employee Only Traffic

In order to gain confidence in that area we configured our bespoke endpoints to send all requests to the HHVM cluster if the user requesting the page was an employee. This put almost no load on the cluster, but allowed us to ensure that HHVM could communicate with backend services correctly.

At this point we encountered some more incompatibilities with the memcached extension. We noticed that our API rate limiter was never able to find keys to decrement. This was caused by the decrement function being implemented incorrectly in the HHVM extension. In the process of debugging this we noticed that memcached was always returning false for every request HHVM made to it. This turned out to be a bug in the client-side hashing function present in HHVM. What we learned from this is that while the HHVM runtime is rock-solid, a lot of the included extensions aren’t. Facebook thoughtfully wrote a lot of the extensions specifically for the open source release of HHVM. However, many of them are not used internally because Facebook has their own clients for memcached and MySQL, and as a result have not seen nearly as much production traffic as the rest of the runtime. This is important to keep in mind when working with HHVM. We expect this situation will improve as more and more teams test it out and contribute patches back to the project, as we at Etsy will continue to do.

After resolving these issues it came time to slowly move production traffic from the PHP API cluster to the HHVM API cluster.

Slow Ramp Up

As we began the slow ramp in production we noticed some strange timestamps in the logs:

[23/janv./2015:22:40:32 +0000]

We even saw timestamps that looked like this:

[23/ 1月/2015:23:37:56]

At first we thought we had encountered a bug with HHVM’s logging system. As we investigated we realized the problem was more fundamental than that.

At Etsy we use the PHP function setlocale() to assist in localization. During a request, after we load a user we call setlocale() to set their locale preferences accordingly. The PHP function setlocale() is implemented using the system call setlocale(3). This system call is process-wide, affecting all the threads in a process. Most PHP SAPIs are implemented such that each request is handled by exactly one process, with many processes simultaneously handling many requests.

HHVM is a threaded SAPI. HHVM runs as a single process with multiple threads where each thread is only handling exactly one request. When you call setlocale(3) in this context it affects the locale for all threads in that process. As a result, requests can come in and trample the locales set by other requests as illustrated in this animation.

locale overwriting

We have submitted a pull request re-implementing the PHP setlocale() function using thread-local equivalents. When migrating to HHVM it’s important to remember that HHVM is threaded, and different from most other SAPIs in common use. Do an audit of extensions you’re including and ensure that none of them cause side effects that could affect the state of other threads.

Release!

After rolling HHVM out to just the internal API cluster we saw a noticeable improvement in performance across several endpoints.

public_search_listings

HHVM vs PHP on Etsy Internal API

It’s Not Just Speed

In the process of experimenting with HHVM we discovered a few under-documented features that are useful when running large PHP deployments.

Warming up HHVM

The HHVM team recommends that you warm up your HHVM process before having it serve production traffic:

“The cache locality of the JITted code is very important, and you want all your important endpoints code to be located close to each other in memory. The best way to accomplish this is to pick your most important requests (say 5) and cycle through them all serially until you've done them all 12 times.

They show this being accomplished with a simple bash script paired with curl. There is a more robust method in the form of “warmup documents”.

You specify a warmup document in an HDF file like this:

cmd = 1
url = /var/etsy/current/bin/hhvm/warmup.php // script to execute
remote_host = 127.0.0.1
remote_port = 35100
headers { // headers to pass into HHVM
    0 {
        name = Accept
        value = */*
    } 1 {
        name = Host
        value = www.etsy.com
    } 2 {
        name = User-Agent
        value = Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.46 Safari/535.11
    }
}

To tell HHVM to execute that warmup document on startup, simply reference it like so:

Server {
    WarmupRequests {
       * = /var/etsy/current/bin/hhvm/warmup.hdf
    }
}

This will execute /var/etsy/current/bin/hhvm/warmup.php between when the HHVM binary is executed and when the process accepts connections. It will only execute it once however, and HHVM will not JIT any code until after the twelfth request. To execute a warmup document 12 times simply reference it 12 times from the config file, like so:

Server {
    WarmupRequests {
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
        * = /var/etsy/current/bin/hhvm/warmup.hdf
    }
}

Profiling HHVM with perf(1)

HHVM makes it really easy to profile PHP code. One of the most interesting ways is with Linux’s perf tool.

HHVM is a JIT that converts PHP code into machine instructions. Because these instructions, or symbols, are not in the HHVM binary itself, perf cannot automatically translate these symbols into functions names. HHVM creates an interface to aid in this translation. It takes the form of a file in /tmp/ named according to this template:

/tmp/perf-<pid of process>.map

The first column in the file is the address of the start of that function in memory. The second column is the length of the function in memory. And the third column is the function to print in perf.

Perf looks up processes it has recorded by their pid in /tmp to find and load these files. (The pid map file needs to be owned by the user running perf report, regardless of the permissions set on the file.)

If you run

sudo perf record -p <pid> -ag -e instructions -o /tmp/perf.data -- sleep 20

perf will record all of the symbols being executed for the given pid and the amount of CPU time that symbol was responsible for on the CPU over a period of 20 seconds. It stores that data in /tmp/perf.data. Once you have gathered data from perf with a command such as the above, you can display that data interactively in the terminal using perf report.

perf report Click to embiggen

This show us a list of the most expensive functions (in terms of instructions executed on the CPU) being executed. Functions prefixed with HPHP:: are functions built into the language runtime. For example HPHP::f_sort accounts for all calls the PHP code makes to sort(). Functions prefixed with PHP:: are programmer-defined PHP functions. Here we can see that 36% of all CPU time occurred in Api_Handler::respond(), for example. Using perf() to profile PHP code is powerful on its own, but having the ability to jump from a PHP function into an HPHP function allows you to see what parts of your codebase HHVM doesn’t handle efficiently. Using this process we were able to determine that sort() calls were slow when enable_zend_sorting was enabled. After patching it to be more efficient, we realized a significant CPU and performance win:

CPU drop

Median perf drop

This change resulted in an additional increase in throughput across our API cluster as well as improved response times.

HHVM Interactive Debugger

HHVM provides an interactive debugger called “hphpd”. hphpd works similarly to gdb: it is a command line based interactive debugger

$ hhvm -c /etc/php.d/etsy.ini -m debug bin/test.php
Welcome to HipHop Debugger!
Type "help" or "?" for a complete list of commands.

Program bin/test.php loaded. Type &39;[r]un&39; or &39;[c]ontinue&#39; to go.

Here we set a breakpoint on a function:

hphpd> b Shutdown::registerApiFunctions()
Breakpoint 1 set upon entering Shutdown::registerApiFunctions()
But wont break until class Shutdown has been loaded.
Commence execution until we encounter a breakpoint:
hphpd> continue
Breakpoint 1 reached at Shutdown::registerApiFunctions() on line 101 of /home/dmiller/development/Etsyweb/phplib/Shutdown.php
100     public static function registerApiFunctions() {
101*        self::registerFunction([&#39;Shutdown&#39;, &#39;apiShutdown&#39;]);
102     }

Step into that function:

hphpd> step
Break at Shutdown::registerFunction() on line 74 of /home/dmiller/development/Etsyweb/phplib/Shutdown.php
73     public static function registerFunction() {
74*        $callback = func_get_args();
75

Step over that function:

hphpd> next
Break at Shutdown::registerFunction() on line 76 of /home/dmiller/development/Etsyweb/phplib/Shutdown.php
75
76*        if (empty($callback)) {
77             $bt = new Dev_Backtrace();

hphpd> next
Break at Shutdown::registerFunction() on line 82 of /home/dmiller/development/Etsyweb/phplib/Shutdown.php
81         }
82*        if (!is_callable($callback[0])) {
83             $bt = new Dev_Backtrace();

After adding a few lines to your configuration file you can use this debugger on any code that executes in HHVM.

Lessons Learned from the Experiment

The process of migrating our API cluster to HHVM taught us a lot about HHVM as well as how to better perform such migrations in the future. The ability to clone HTTP traffic and tee it to a read-only test cluster allowed us to gain confidence in HHVM much more quickly than we could have otherwise. While HHVM proved to be rock-solid as a language runtime, extensions proved to be less battle-tested. We frequently encountered bugs and missing features in the MySQL, Memcached and OAuth extensions, among others. Finally it’s important to remember that HHVM is threaded, which can result in a weird interplay between the runtime and system calls. The resulting behavior can be very surprising.

HHVM met our expectations. We were able to realize a greater throughput on our API cluster, as well as improved performance. Buying fewer servers also means less waste and less power consumption in our data centers, which is important to Etsy as a Benefit Corporation.

You can follow Dan on Twitter @jazzdan.

Special thanks Sara Golemon, Paul Tarjan and Josh Watzman at Facebook. Extra special thanks to Keyur Govande and Adam Saponara at Etsy.