API First Transformation at Etsy – Operations
This is the second post in a series of three about Etsy’s API, the abstract interface to our logic and data. The previous post is about concurrency in Etsy’s API infrastructure. This post covers the operational side of the API infrastructure.
Operations: Architecture Implications
How do the decisions for developing Etsy’s API that we discussed in the first post relate to Etsy’s general architecture? We’re all about Do It Yourself at Etsy. A cloud is just other people’s computers, and not in the spirit of DIY; that’s why we rather run our own datacenter with our own hardware.
Also, Etsy kicks it old school and runs on a LAMP stack. Linux, Apache, MySQL, PHP. We’ve already talked about PHP being a strictly sequential, single-threaded, shared-nothing environment, leading to our choice of parallel cURL. In the PHP world, everything runs through a front controller, for example index.php. In that file, we have to include other PHP files if we need them, and to make that easier, we usually use an autoloader to help with dependencies.
Every web request gets a new PHP environment in its own instance of the PHP interpreter. The process of setting up that environment is called bootstrap. This bootstrapping process is a fixed cost in terms of CPU work, regardless of the actual work required by the request. By enabling multiple, concurrent HTTP sub-requests to fetch data for a single client request, this fixed cost was multiplied. Additionally, this concurrency encouraged more work to be done within the same wall clock time. Developers built more diverse features and experiences, but at the cost of using more back-end resources. We had a problem.
Problem: PHP time to request + racking servers D:
As more teams adopted the new approach to build features in our apps and on the web, more and more backend resources were being consumed, primarily in terms of CPU usage from PHP. In response, we added more compute capacity, over time growing the API to four times the number of servers prior to API v3. Continuing down this path we would have exhausted space and power in our datacenters. This was not a long term solution.
To solve this, we tried several strategies at once. First, we skipped some work by allowing to mark some subrequests as optional. This approach was abandoned because people used it as a graceful error recovery mechanism, triggering an alternate subrequest, rather than for optional data fetches. This didn’t help us reduce the overall work required for a given client request.
Also, we spent some time optimizing the bootstrap process. The bootstrap tax is paid by all requests and subrequests, making it a good place to focus our efforts. This initially showed benefit with some low hanging fruit, but it was a moving target in a changing codebase, requiring constant work to maintain a low bootstrap tax. We needed other ways of doing less work.
A big step forward was the introduction of HTTP response caching. We had to add caching quickly, and first tried the same cache we use for image serving, Apache Traffic Server. While being great for caching large image files, it didn’t work as well for smaller, latency sensitive API responses. We settled on Varnish, which is fast and easy to configure for our needs. Not all endpoints are being cached, but for cached ones, Varnish will serve the same response many times. We accept staleness for a small 10 – 15 minute period, drastically reducing the amount of work required for these requests. For the cacheable case, Varnish handles thousands of requests per second with a 80% hit rate. Because the API framework requires input parameters to be explicit in the HTTP request, this meshed well with introducing the caching tier. The framework also transparently handles locale, passing the user’s language, currency and region with every subrequest, which Varnish uses to manage variants.
The biggest step forward came from a courageous experiment. Dan from the core team looked at bigger organizations that faced the same problem, and tried out facebook’s hhvm on our API cluster. And got a rocketship. We could do the same work, but faster, solving this issue for us entirely. The performance gain from hhvm was a catalyst for performance improvements that made it into PHP7. We are now completely switched over to PHP7 everywhere, but it’s unclear what we would have done without hhvm back in the day.
In conclusion, concurrency proved to be great for logical aggregation of components, and not so great for performance optimization. Better database access would be better for that.
Problem: Balancing the load
If we have a tree-like request with sub-requests, a simple solution would be to route this initial request via a load balancer into a pool, and then run all subrequests on the same machine. This leads to a lumpy distribution of work. The next step from here is one uniform pool, and routing the subrequests back into that pool again, to have a good balance across the cluster. Over time (and because we experimented with hhvm), we created three pools that correspond to the three logical tiers of endpoints. In this way, we can monitor and scale each class of endpoints separately, even though each node in all three clusters works the same way.
Where would this not work?
If we sit back and think about this for a bit – how is this architecture specific to Etsy’s ecosystem? Where wouldn’t it work? What are the known problems?
The most obvious gaping hole is that we have no API versioning. How do we even get away with that? We solve this by keeping our public API small and our internal API very very fluid. Since we control both ends of the internal API via client generation and meta-endpoints, the intermediate language of domain objects is free to evolve. It’s tied into our continuous deployment system, moving along with up to 60 deploys per day for etsy.com. And the client is constantly in flux for the internal API.
And as long as it’s internal at Etsy, even the outside layer of bespoke AJAX endpoints is very malleable and matures over time.
Of course this is different for the Apps and the third party API, but those branch off after maturing on the internal API service over several years. Software development companies who focus on an extensive public API or even have that as the main service could not work in this way. They would need an internal place to let the API endpoints mature, which we do on the internal API service that is powering Etsy.
Let’s talk about tooling that we built to learn more about the behavior of our code in practice. Most of the tools that we developed for API v3 are around monitoring the new distributed system.
CrossStitch: Distributed tracing
As we know, with API v2, we had the problem that almost an arbitrary amount of single threaded work could be generated based on the query parameters, and this was really hard to monitor. Moving from the single-threaded execution model to a concurrent model triggering multiple API requests was even more challenging to monitor. You can still profile individual requests with the usual logging and metrics, but it’s hard to get the entire picture. Child requests need to be tied to back to the original request that triggered them, but they themselves might be executed elsewhere on the cluster.
To visualize this, we built a tool for distributed tracing of requests, called CrossStitch. It’s a waterfall diagram of the time spent on different tasks when loading a page, such as HTTP requests, cache queries, database queries, and so on. In darker purple, you can see different HTTP requests being kicked off for a shop’s homepage, for example you see the request for the shop’s about page is running in parallel with requests for other API components.
Fanout Manager: Feedback on fanout limit exhaustion for developers
Bespoke API calls can create HTTP request fanout to concurrent components, which in turn can create fanout to atomic component endpoints. This can create a strain on the API and database servers that is not easy for an endpoint developer to be aware of when building something in the development environment or rolling out a change to a small percentage of users.
The fanout manager aims to put a ceiling on the overall resource requests that are in flight, and we are doing this in the scheduler part of the curl callback orchestrator by keeping track of sub-requests in memcached. When a new request hits the API server, a key based on the unique user ID of that root request is put into memcached. This key works as a counter of parallel in-flight requests for that specific root request. The key is being passed on to the concurrent and component endpoint subrequests. When the scheduler runs a subrequest, it increments the counter for that key. When the request got a response and it’s slot is freed in the scheduler, the counter for the key is decremented. So we always know how many total subrequests are in-flight for one root request at the same time.
In a distributed system like this, multiple requests can be competing for the same slot. We have a problem that requires a lock.
To avoid the lock overhead, we circumvent the distributed locking problem by relying on memcached’s atomic increment and decrement operation. We optimistically first increment the memcached key counter, and then check whether the operation was valid and we actually got the slot. Sometimes we have to decrement again because this optimistic assumption is wrong, but in that case we are waiting for other requests to finish anyway and the extra operation makes no difference.
If an endpoint has too many sub-requests in flight, it just waits before being able to make the next request. This provides a good feedback for our developers about the complexity of the work before the endpoint goes into production. Also, the fanout limit can be hand-tweaked for specific cases in production, where we absolutely need to fetch a lot of data, and a higher number of parallel requests speeds up that fetching process.
Automated documentation of endpoints: datafindr
We also have a tool for automated documentation of new endpoints. It is called datafindr. It shows endpoints and typed resources, and example calls to them, based on a nightly snapshot of the API landscape.
Wanted: Endpoint decommission tool
Writing new endpoints is easy in our framework, but decommissioning existing endpoints is hard. How can we find out whether an existing endpoint is still being used?
Right now we don’t have such a tool, and to decommission an existing endpoint, we have to explicitly log whether that specific endpoint is called, and wait for an uncertain period of time, until we feel confident enough to decide that no one is using it any more. However, in theory it should be possible to develop a tool that monitors which endpoints become inactive, and how long we have to wait to gain a statistically significant confidence of it being out of use and safe to remove.
This is the second post in a series of three about Etsy’s API, the abstract interface to our logic and data. The next post will be published in two weeks, and will cover the adoption process of the new API platform among Etsy’s developers. How do you make an organization switch to a new technology and how did this work in the case of Etsy’s API transformation?
Posted by Stefanie Schirmer on 06 Sep, 2016
Posted by Liz Wald on 21 Sep, 2010