Executing a Sunset

Posted by on February 1, 2019

We all know how exciting it is to build new products, the thrill of a pile of new ideas waiting to be tested, new customers to reach, knotty problems to solve, and dreams of upward-sloping graphs.  But what happens when it is no longer aligned with the trajectory of the company. Often, the product, code, and infrastructure become a lower priority, while the team moves on to the next exciting new venture. In 2018, Etsy sunset Etsy Wholesale, Etsy Studio, and Etsy Manufacturing, three customer-facing products.

In this blog post, we will explore how we sunset these products at Etsy. This process involves a host of stakeholders including marketing, product, customer support, finance and many other teams, but the focus of this blog post is on engineering and the actual execution of the sunset.

ProcessPre-code deletion

Use Feature Flags and Turn off Traffic

Once the communication had been done through emails, in-product announcements, and posts in the user forums, we started focusing on the execution. Prior to the day of each sunset, we used our feature flagging infrastructure to build a switch to disable access to the interface for Wholesale and Manufacturing. Feature flags are an integral part of the continuous deployment process at Etsy. Feature flags reinforce the benefits of small changes and continuous delivery.

On the day of the sunset, all we had to do was deploy a one line configuration change and the product was shut off since there was a feature flag that controlled access to these products.

A softer transition is often preferable to a hard turn off. For example, we disabled the ability for buyers to create new orders one month before shutting Etsy Wholesale off. That gave sellers a chance to service the orders that remained on-platform, avoiding a mad-dash at the end.

Export Data for Users

Once the Etsy Wholesale platform was turned off, we created data export files for each seller and buyer with information about every order they received or placed during the five years that the platform was active. Generating and storing these files in one shot allowed us to clean up the wholesale codebase without fear that parts of it would be needed later for exporting data.

Set Up Redirects

We highly recommend redirects through feature flags,  but a hard DNS redirect might be required in some circumstances. The sunset of Etsy Studio was complicated by the fact that in the middle of this project, etsy.com was being migrated from on-premise hosting to the cloud. To reduce complexity and risk for the massive cloud migration project, Etsy Studio had to be shut off before the migration began. On the day before the cloud migration, a DNS redirect was made to forward any request on etsystudio.com to a special page on etsy.com that explained that Etsy Studio was being shut down. Once the DNS change went live, it effectively shut off Etsy Studio completely.

Code Deletion Methodology:

Once we confirmed that all three products were no longer receiving traffic, we kicked off the toughest part of the engineering process: deleting all the code. We tried to phase it in two parts, as tightly and loosely integrated products. Integrations in the most sensitive/dangerous spots were prioritized, and safer deletions were done later as we were heading into the holiday season (our busiest time of the year).

For Etsy Wholesale and Etsy Manufacturing, we had to remove the code piece-by-piece because it was tightly integrated with other features on the site. For Etsy Studio, we thought we would be able to delete the code in one massive commit. One benefit of our continuous integration system is that we can try things out, fail, and revert without negatively affecting our users. This proved valuable as when we tried deleting the code one massive commit, some unit tests for the Etsy codebase started failing. We realized that small dependencies between the code had formed over time. We decided to delete the code in smaller, easier to test, chunks.
 

a small example of dependencies creeping in where you least expect them.

Challenges: Planning (or lack of it) for slowdowns

Interdependencies

During the process of sunsetting, we didn’t consider how busy other teams would be heading into the holiday season. This slowed down the process of getting code reviews approved. This became especially crucial for us since we were removing and modifying big chunks of code maintained by other teams.

There were also several other big projects in flight while we were trying to delete code across our code base and that slowed us down. One example that I already mentioned was cloud migration: we couldn’t shut off Etsy Studio using a config flag and we had to work around it.

Commit Size and Deploys

To reduce risk, our intention was to keep our commits small, but when trying to delete so much code at once, it’s hard to keep all your commits small. Testing and deploying was a least 50% of our team’s time. Our team made about 413 commits over five months, deleting 275,000 lines of code. That averages out to 630 lines of code deleted per commit, which were frequently deployed one at a time.

Compliance

We actively think of compliance when building new things, but it is also important to keep in mind compliance requirements when you delete code. Etsy’s SOX compliance system requires that certain files in our codebase are subject to extra controls. When we deploy changes to such files, we need additional reviews and signoffs. We had to do 44 SOX reviews since we did multiple small commits. Each review requires approvals by multiple people and this added on average a few hours to each bit of deletion we did.  Similarly, we considered user privacy and data protection in how to make retention decisions about sunsetted products, how to make data available for export, and how it impacts our terms and policies.

Deleting so much code can be a difficult process. We had to revert changes from production at least five times, which, for the most part was simple. One of these five reverts was complicated by a data corruption issue affecting a small population of sellers, which required several days of work to write, test, and run a script to fix the problem.

The Outcome

We measured success using the following metrics:

From 1000s of error logs a day for wholesale, to less than 100 (eventually we got this to zero)

The roots that three products had in our systems demonstrated the challenges in building and maintaining a standalone product alongside our core marketplace. The many branching pieces of logic that snuck in made it difficult to reuse lots of existing code. By deleting 275,000 lines of code, we were able to reduce tech debt and remove roadblocks for other engineers.   

Posted by on February 1, 2019
Category: api, data, databases, engineering, events, people

3 Comments

Thank you for sharing. I would love to read more about the product side of this decision.

Good tips 🙂

Hey,
This is really an excellent blog as well as its content.