Open Source Spring Cleaning
At Etsy, we are big fans of Open Source. Etsy as it is wouldn’t exist without the myriad of people who have solved a problem and published their code under an open source license. We serve etsy.com through the Apache web server running on Linux, our server-side code is mostly written in PHP, we store our data in MySQL, we track metrics using graphs from from Ganglia and Graphite to keep us up to date, and use Nagios to monitor the stability of our systems. And these are only the big examples. In every nook and cranny of our technology stack you can find Open Source code.
Part of everyone’s job at Etsy is what we call “Generosity of Spirit”, which means giving back to the industry. For engineers that means that we strive to give a talk at a conference, write a blog post on this blog or contribute to open source at least once every year. We love to give back to the Open Source community when we’ve created a solution to a problem that we think others might benefit from.
Maintenance and Divergence
This has led to many open sourced projects on our GitHub page and a continuing flow of contributions from our engineers and the Open Source community. We are not shy about open sourcing core parts of our technology stack. We are publicly developing our deployment system, metrics collector, team on-call management tool and our code search tool. We even open sourced the crucial parts of our atomic deployment system. And it has been very rewarding to receive bug fixes and features from the wider community that make our software more mature and stable.
As we open sourced more projects, it became tempting to run an internal fork of the project when we wanted to add new features quickly. These projects with internal forks quickly diverged from the open sourced versions. This meant the work to maintain the project was doubled. Anything fixed or changed internally had to be fixed or changed externally, and vice versa. In a busy engineering organization, the internal version usually was a priority over the public one. Looking at our GitHub page, it wasn’t clear – even to an Etsy engineer – whether or not we were actively maintaining a given project.
We end up with public projects that hadn’t been committed to in years. Open sourcers who were taking the time to file a bug report and didn’t get an answer on the issue, sometimes for years, which didn’t instill confidence in potential users. No one could tell whether the project is a maintained piece of software or a proof of concept that won’t get any updates.
We want to do better by the Open Source community, since we’ve benefited so much from existing Open Source Software. We did a bit of Open Source spring cleaning to bring more clarity to the state of our open source projects. Going forward our projects will be clearly labeled as either maintained, not maintained, or archived.
Maintained projects are the default and are not specifically labeled as such. For maintained projects, we’re either running the open source version internally or currently working on getting our internal version back in sync with the public version. We already did this for our deployment tool in the past. We are actively working on any maintained projects: merging or commenting on pull requests, answering bug reports, and adding new features.
We also have a few of projects that haven’t seen public updates in years. Usually this is because we haven’t found a way to make the project configurable in a way such that we can run the public version internally without slowing down our development cycles. However the code as it is available serves as a great proof of concept and illustrates how we approach the problem. Or it might have been a research project that we have abandoned because it turned out to not really solve our problem in the long run but still wanted to share what we tried. Those projects will just stay the way they are and likely will rarely receive any updates. We will turn off issues and pull requests on those and make it very clear in the README that this is a proof of concept only.
We also have a number of projects that we have open sourced because we were using them at one time but have since abandoned altogether. We have likely found that there exists a better solution to the problem or that the solution hasn’t proven useful in the long run. In those cases we will push a commit to the master branch that removes all code and only leaves the README with a description of the project and its status. The README will link to the last commit containing actual code. This way the code doesn’t just vanish, but the project is clearly not active. Those projects will also have issues and pull requests turned off.
In addition to the archival of those projects we will also start to delete forks of other Open Source projects that we’ve made at some point, but aren’t actively maintaining.
We have learned a lot about maintaining Open Source projects over the last couple of years. The main lesson we want to share is that it’s essential to use the Open Source version internally to provide a good experience for other Open Source developers who want to use our software. We strive to always learn and get better at everything we do. If you’ve been waiting for us to respond to an issue or merge a pull request, hopefully this will give you more insight into what has been going on and why it took so long for us to respond, and we hope that our new project labeling system will also give you more clarity about the state of our open source projects. In order to be good open source citizens we want to always do our best to give back in a way that is helpful for everyone. And a little spring cleaning is always a good thing. Even if it’s technically summer already.