Some more details here:
At Etsy we have one hard and fast rule for new Engineers on their first day: deploy to production. We’ve talked a lot in the past about our deployment, metrics, and testing processes. But how does the development environment facilitate someone coming in on day one and contributing something that takes them through the steps of committing code, running it through our tests, and deploying it with deployinator?
A new engineer’s first task is to snap a photo using our in house photo booth (handmade of course) and upload it to the about page. Everyone gets a shiny new virtual machine with a working development version of the site, along with their LDAP credentials, github write access, and laptop. We use an internal cloud system for the VM’s, mostly because it was the most fun thing to build, but also gives us the advantage of our fast internal network and dedicated hardware. The goal is a consistent environment that mirrors production as closely as possible. So what is the simplest way to build something like this in house?
We went with a KVM/QEMU based solution which allows for native virtualization. As an example of how you may go about building an internal cloud, here’s a little bit about our hardware setup. The hypervisor runs on HP DL380 G7 servers that provide us with a total 72G RAM and 24 cores per machine. We provision 11 guests per server, which allows each VM 2 CPU cores, 5G RAM, and a 40G hard drive. Libvirt supports live migrations across non-shared storage (in QEMU 0.12.2+) with zero downtime which makes it easy to allocate and balance VM’s across hosts if adjustments need to be made throughout the pool.
We create CentOS based VM’s from a disk template that is maintained via Openstack Glance, which is a tool that provides services for discovering, registering, and retrieving virtual images. The most recent version of the disk images are kept in sync via glance, and exist locally on each server for use in the creation of a new VM. This is faster than trying to pull the image over the network on creation or building it from scratch using Kickstart like we do in production. The image itself may have been kickstarted to match our production baseline, and we template a few key files such as the network and hosts information which is substituted on creation, but in the end the template is just a disk image file that we copy and reuse.
The VM creation process involves pushing a button on an internal web page that executes a series of steps. Similar to our one button deployment system, this allows us to iterate on the underlying system without disruption to the overall process. The web form only requires a username which must be valid in LDAP so that the user can later login. From there the process is logged such that it that provides realtime feedback to the browser via websockets. The first thing that happens is we find a valid IP in the subnet range, and we use nsupdate to add the DNS information about the VM. We then make a copy of the disk template which serves as the new VM image and use virt-install to provision the new machine. Knife bootstrap is then kicked off which does the rest of the VM initialization using chef. Chef is responsible for getting the machine in a working state, configuring it so that it is running the same version of libraries and services as the other VM’s, and getting a checkout of the running website.
Chef is a really important part of managing all of the systems at Etsy, and we use chef environments to maintain similar cookbooks between development and production. It is extremely important that development does not drift from production in its configuration. It also makes it much easier to roll out new module dependencies or software version updates. The environment automatically stays in sync with the code and is a prime way to avoid strange bugs when moving changes from development to production. It allows for a good balance between us keeping things centralized, controlled, and in a known-state in addition to giving the developers flexibility over what they need to do.
At this point the virtual machine is functional, and the website on it can be loaded using the DNS hostname we just created. Our various tools can immediately be run from the new VM, such as the try server, which is a cluster of around 60 LXC based instances that spawn tests in parallel on your upcoming patch. Given this ability to modify and test the code easily, the only thing left is to overcome any fear of deployment by hopping in line and releasing those changes to the world. Engineers can be productive from day one due to our ability to quickly create a consistent environment to write code in.
In a past post, Divide and Concur, we told you how we approached dividing our large test suite into smaller test suites by keeping similar tests together rather than arbitrarily dividing.
Dividing tests by common points of error made triaging failures systemic failures quick, and enticed everyone to write faster, more deterministic tests, but not all was perfect. Our Jenkins dashboard was quite verbose.
The numerous jobs on our dashboard were great for pinpointing where the failures were, but it was difficult to determine at which stage of the deploy pipeline the failures existed. Some tests were executed on every commit. Some tests were executed when the QA button was pushed. Some tests were executed against a freshly pushed Princess or Production build. We were using the Jenkins IRC plugin, and the number of messages per hour was drowning out necessary communication in the
We needed some way to communicate the test status at each stage of the deployment pipeline.
We considered using Downstream Jobs, but fingerprinting was awkward and difficult to set up, and all-in-all it wasn’t quite what we were looking for.
We also considered Matrix Jobs, but a Matrix Job is designed to execute several jobs with parameter(s) varied along configuration vector(s), i.e. build node, operating system, browser, arbitrary parameter, etc. This was not a fit for the purpose because our jobs had wildly different configurations that could not be coerced into mere parameter differences.
What we needed was a way to create a Jenkins job type that would execute a selection of arbitrary Jenkins jobs, wait for the jobs to finish, and report a single result while still making it possible to drill down to sub-jobs to determine the sources of failures.
So we wrote a Jenkins plugin to achieve this, the Jenkins Master Project Plugin.
Now our Jenkins dashboard represents the deployment pipeline:
When a stage turns red (or yellow), you can click through to that particular Master Build, see what tests failed and drill through the results (or even rebuild).
The Triggering User and Master Project plugins are both integral to our latest version of Try.
We have also made our Nagios plugin for Jenkins readily available on the Etsy GitHub account. We used this for experimenting with alerting on the health of Jenkins.
All of these plugins are freely available on GitHub under the Etsy organization. Enjoy!
If you’re a female developer or someone who wants to get more female developers involved in your organization, Austin All-Girl Hack Night and Girl Develop It invite you to come have brunch with us, talk about dev, and hopefully make some new friends! We’ll be serving a full breakfast, including coffee and breakfast cocktails, thanks to our gracious sponsors: Etsy, Bocoup, spire.io, Headspring, and TEKsystems.
Also, the RSVP is a shell, and
invitation is executable, so save yourself the pain of trying to
cat it like me.
Badge optional, RSVP mandatory, Unix skills strongly suggested to get through the RSVP, a willingness to drink breakfast cocktails recommended.
At Etsy, we are constantly evaluating the security and safety of our members as they use the site. One way we do this is by analyzing user generated content (UGC) for possible problems. As part of the process we integrate results from the Google Safe Browsing (GSB) service. Typically this is client-side technology used by web browsers to protect the end-user from visiting dangerous websites that might serve malware or be part of a phishing scam.
The Security and Defensive Systems group here at Etsy have flipped this model around. Rather than warn the user when a malicious link is followed, we block the link (or the whole page) from displaying in the first place.
There are a few ways to use the Google Safe Browsing service. For lower volume queries, there is a very simple REST API. For high volume, high performance systems, the GSB V2 protocol is more appropriate as it mirrors the entire GSB database locally. It’s designed to scale to an extremely large number of clients while minimizing network traffic. To do so, it uses a complicated protocol involving multiple blacklists and whitelists sent as a series of distributed binary diffs.
While many implementations of the GSB protocols are available, for a variety of reasons they were not appropriate for use in Etsy’s operational environment (e.g. use of autoincrement ids, designed to run under a web server, etc), and so we created our own. We have open sourced our version and made it available in our gsb4ugc git repository. It’s in PHP, but it should be straightforward to port to other languages, as it’s really more of a toolkit than a standalone product.
To use, you’ll need to create and assemble resources to create your own API. First you need to set up some boilerplate for both the GSB updater and client:
// Set up a db connection. $dbh = new PDO('mysql:host=127.0.0.1; dbname=gsb', 'user', ‘password’); $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION); // Create storage; works with mysql, sqlite. // No auto-increment IDs, so it's safe with master-master replication. // Etsy subclasses this and adds StatsD calls. http://etsy.me/dQwVXi $storage = new GSB_StoreDB($dbh); // Create network access. Pass in your GSB API key. Uses PHP curl. $network = new GSB_Request($api); // Logger. Subclass to use your logging infrastructure (or not). $logger = new GSB_Logger(5);
Then one needs to setup a cron job that runs every 30 minutes to start mirroring the GSB database.
$updater = new GSB_Updater($storage, $network, $logger); $updater->downloadData($gsblists, FALSE);
It takes about 24 hours to full sync up. Finally, you are able to start checking URLs:
$client = new GSB_Client($storage, $network, $logger); $url = "http://malware.testing.google.test/testing/malware/”; print_r($client->doLookup($url));
should return something similar to:
[list_id] => 1 [add_chunk_num] => 70219 [host_key] => b2ae8c6f [prefix] => 51864045 [match] => malware.testing.google.test/testing/malware/ [hash] => 518640453f8b2a5f0d43bc2251.... [host] => testing.google.test/ [url] => http://malware.testing.google.test/testing/malware/ [listname] => goog-malware-shavar
More details are in the bin/samples directory of our repository.
We are currently scanning a few types of user generated content in production. This is done asynchronously from the website so we don’t block the user experience, however we still care about performance. Almost all performance metrics here at Etsy measure maximum and minimum times, as well as 90th percentile and mean, and this is no exception. The peak times occur when a network call is required, otherwise, it’s typically 5ms.
Since this is security-related code, another goal of gsb4ucg is testability. The protocol-parsing code is separated out from database and networking code, so it’s very easy to write unit tests. This also helps to explain how the code works. As you see below, we have some more work to do:
In addition to expanding test coverage and improving performance, we’d like to add MAC support, and to use it for more content types on Etsy. We’d also like to add the results from PhishTank for completeness and redundancy. Comments, bug reports, patches and pull requests are all welcome, but if this type of work interests you, consider doing it full time.
Now, go forth and browse and consume content safely!
As you might imagine, we at Etsy get a lot of “can I pick your brain?” requests about how we do things at Etsy, or what we’ll call here The Etsy Way. While we take these requests as huge compliments to the work we do, we have to be somewhat protective of the team’s time. We’re proud of what we’ve been doing and believe in sharing it, so we’ve invested hundreds (if not thousands) of hours into providing public information on this blog and elsewhere. This is the best way to scale our sharing as broadly as possible. (And we’ll still meet with some people — we’ll just ask that you read everything below first since we’ve worked so hard on it!) Consider this post that first friendly conversation over coffee.
The most important component of The Etsy Way is culture and that is as difficult to teach as it is important. To get a sense of how we think about culture, take a look at Optimizing for Developer Happiness, which includes a 24-minute video of a talk I did and a link to the accompanying slides.
Here are a few more links about culture:
- Scaling startups
- How does Etsy manage development and operations?
- Code as Craft: Building a Strong Engineering Culture at Etsy (slides)
With the culture bits explained, below are a few other key posts in the Etsy canon. All of these are inter-related with the culture, of course, and help reinforce it (remember it’s all about culture. Did someone say “culture”?):
Quantum of Deployment (Erik Kastner). We deployed code to production more than 10,000 times in 2011. If you wonder “how did they do that?” this post will tell you all you need to know.
Track every release (Mike Brittain). Here, we write about the methods we use to track the success of every code deploy with application metrics. This is part of the not-so-secret sauce.
Measure Anything, Measure Everything (Ian Malpass). We introduce you to StatsD, the open source software we built at Etsy to enable obsessive tracking of application metrics and just about anything else in your environment. The best part is you can download StatsD yourself and try it out.
Divide and Concur (Noah Sussman and Laura Beth Denker). By reading this post, you’ll learn about all the inner workings of our automated testing setup: what software we use (with plenty of links), how we set it up, and the philosophy behind it all.
We also have tons of slides from talks we have done, all available in the Code as Craft group on Slideshare.
And last but not least, we have an Etsy Github repository with lots of goodies.
Pretty much everything we write about above is open source (even the culture) so the motivated reader will find links to tips and actual software along the way to actually set things up on his/her own. If there’s anything you’d like to know more about The Etsy Way, just let us know in the comments. We’ll add it if we have it, and probably write it if we don’t.
As you can tell, a really important part of The Etsy Way is encouraging people on the team to contribute to open source, write informative and entertaining blog posts, and put together killer presentations. If you want to join the fun, we’re always hiring.
A few places to look for us in the next few months.
Michelle D’Netto and Lindsey Baron, February 23, Selenium 101 Workshop. Brooklyn, NY.
Laura Beth Denker, February 24, Scaling Communication via Continuous Deployment. London.
We’re sponsoring Devopsdays Austin, April 2nd and 3rd. Austin, TX. Look for us.
John Goulah, April 11th, Starts with S and Ends With Hard: The Etsy Shard Architecture. Santa Clara, CA
Michelle D’Netto, Stephen Hardisty and Noah Sussman, April 16-18, Handmade Etsy Tests and Selenium In the Enterprise: What Went Right, What Went Wrong (So Far). London.
Laura Beth Denker, May 22nd, Developer Testing 201: When to Mock and When to Integrate and It’s More Than Just Style. Chicago, IL
A look at the state of PHP in 2012. Where are we, how did we get here and how does PHP fit into the current infrastructure ecosystem of the Web? Plus, a quick tour of what is new and cool in PHP 5.4. Reserve your free ticket, and subscribe to our list to find out about upcoming speakers.
Many of you probably use BitTorrent to download your favorite ebooks, MP3s, and movies. At Etsy, we use BitTorrent in our production systems for search replication.
Search at Etsy
Search at Etsy has grown significantly over the years. In January of 2009 we started using Solr for search. We used the standard master-slave configuration for our search servers with replication.
All of the changes to the search index are written to the master server. The slaves are read-only copies of master which serve production traffic. The search index is replicated by copying files from the master server to the slave servers. The slave servers poll the master server for updates, and when there are changes to the search index the slave servers will download the changes via HTTP. Our search indexes have grown from 2 GB to over 28 GB over the past 2 years, and copying the index from the master to the slave nodes became a problem.
The Index Replication Issue
To keep all of the searches on our site working fast we optimize our indexes nightly. Index optimization creates a completely new copy of the index. As we added new boxes we started to notice a disturbing trend: Solr’s HTTP replication was taking longer and longer to replicate after our nightly index optimization.
After some benchmarking we determined that Solr’s HTTP replication was only allowing us to transfer between 2 MB and 8 MB of data per second. We tried various tweaks to HTTP replication adjusting compression and chunk size, but nothing helped. This problem was only going to get worse as we scaled search. When deploying a new slave server we experienced similar issues, only 8 MB per second transfer pulling all of our indexes at once and it could take over 4 hours, with our 3 large indexes consuming most of the transfer time.
Our 4 GB optimized listings index was taking over an hour to replicate to 11 search slaves. Even if we made HTTP replication go faster, we were still bound by our server’s network interface card. We tested netcat from master to a slave server and the results were as expected, the network interface was flooded. The problem had to be related to Solr’s HTTP replication.
The fundamental limitation with HTTP replication is that replication time increases linearly with the number of slaves. The master must talk to each slave separately, instead of all at once. If 10 boxes take 4 hours, scaling to 40 boxes would take over half a day!
We started looking around for a better way to gets bits across our network.
If we need to get the same bits to all of the boxes, why not send the index via multicast to the slaves? It sure would be nice to only send the data once. We found an implementation of rsync which used multicast UDP to transfer the bits. The mrsync tool looked very promising: we could transfer the entire index in our development environment in under 3 minutes. So we thought we would give it a shot in production.
[15:25] <gio> patrick: i'm gonna test multi-rsyncing some indexes from host1 to host2 and host3 in prod. I'll be watching the graphs and what not, but let me know if you see anything funky with the network [15:26] <patrick> ok .... [15:31] <keyur> is the site down?
Multicast rsync caused an epic failure for our network, killing the entire site for several minutes. The multicast traffic saturated the CPU on our core switches causing all of Etsy to be unreachable.
For those folks who have never heard of BitTorrent, it’s a peer-to-peer file sharing protocol used for transferring data across Internet. BitTorrent is a very popular protocol for transferring large files. It’s been estimated that 43% to 70% of all Internet traffic is BitTorrent peer-to-peer sharing.
Our Ops team started experimenting with a BitTorrent package herd, which sits on top of BitTornado. Using herd they transferred our largest search index in 15 minutes. They spent 8 hours tweaking all the variables and making the transfer faster and faster. Using pigz for compression and herd for transfer, they cut the replication time for the biggest index from 60 minutes to just 6 minutes!
Our Ops experiments were great for the one time each day when we need to get the index out to all the slave servers, but it would also require coordination with Solr’s HTTP replication. We would need to stop replication, stop indexing, and run an external process to push the index out to the boxes.
BitTorrent and Solr Together
By integrating BitTorrent protocol into Solr we could replace HTTP replication. BitTorrent supports updating and continuation of downloads, which works well for incremental index updates. When we use BitTorrent for replication, all of the slave servers seed index files allowing us to bring up new slaves (or update stale slaves) very quickly.
Selecting a BitTorrent Library
We looked into various Java implementations of the BitTorrent protocol and unfortunately none of these fit our needs:
- The BitTorrent component of Vuze was very hard to extract from their code base
- torrent4j was largely incomplete and not usable
- Snark is old, and unfortunately unstable
- bitext was also unstable, and extremely slow
Eventually we came upon ttorrent which fit most of the requirements that we had for integrating BitTorrent into the Solr stack.
We needed to make a few changes to ttorrent to handle Solr indexes. We added support for multi-file torrents, which allowed us to hash and replicate the index files in place. We also fixed some issues with large file (> 2 GB) support. All of these changes can be found our fork of the ttorrent code; most of these changes have already been merged back to the main project.
How it Works
BitTorrent replication relies on Lucene to give us the names of the files that need to be replicated.
When a commit occurs the steps taken on the master server are as follows:
- All index files are hashed, a Torrent file is created and written to disk.
- The Torrent is loaded into the BitTorrent tracker on the master Solr server.
- Any other Torrents being tracked are stopped to ensure that we only replicate the latest version of the index.
- All of the slaves are then notified that a new version of the index is available.
- The master server then launches a BitTorrent client locally which seeds the index.
Once a slave server has been notified of a new version of the index, or the slave polls the master server and finds a newer version of the index, the steps taken on the slave servers are as follows:
- The slave server requests the latest version number from the master server.
- The Torrent file for the latest index is downloaded from master over HTTP.
- All of the current index files are hash verified based on the contents of the Torrent file.
- The missing parts of the index are downloaded using the BitTorrent protocol.
- The slave server then issues a commit to bring the new index online.
When new files need to be downloaded, partial (“.part”) files are created. This allows for us to continue downloading if replication gets interrupted. After downloading is completed the slave server continues to seed the index via BitTorrent. This is great for bringing on new servers, or updating servers that have been offline for a period of time.
HTTP replication doesn’t allow for the transfer of older versions of a given index. This causes issues with some of our static indexes. When we bring up new slaves, Solr creates a blank index whose version is greater than the static index. We either have to optimize the static indexes or force a commit before replication will take place.
With BitTorrent replication all index files are hash verified ensuring slave indexes are consistent with the master index. It also ensures the index version on the slave servers match the master server, fixing the static index issue.
The HTTP replication UI is very clunky: you must visit each slave to understand which version of the index it has. Its transfer progress is pretty simple, and towards the end of the transfer is misleading because the index is actually being warmed, but the transfer rate keeps changing. Wouldn’t it be nice to look in one place and understand what’s happening with replication?
With BitTorrent replication the master server keeps a list of slaves in memory. The list of slaves is populated by the slaves polling master for the index version. By keeping this list we can create an overview of replication across all of the slaves. Not to mention the juicy BitTorrent transfer details and a fancy progress bar to keep you occupied while waiting for bits to flow through the network.
Pictures are worth a few thousand words. Lets look again at the picture from the start of this post, where we had 11 slave servers pull 4 GB of index.
Today we have 23 slave servers pulling 9 GB of indexes.
You can see it no longer takes over an hour to get the index out to the slaves despite more than doubling the number of slaves and the index size. The second largest triangle on the graph represents our incremental indexer playing catch up after the index optimization.
This shows the slaves are helping to share the index as well. The last few red blobs are indexes that haven’t been switch to BitTorrent replication.
One of the BitTorrent features is hash verification of the bits on disk. This creates a side effect when dealing with large indexes. The master server must hash all of the index files to generate the Torrent file. Once the Torrent file is generated all of the slave servers must compare the hashes to the current set of index files. When hashing 9 GB of index it can take roughly 60 seconds to perform the SHA1 calculations. Java’s SHA1 implementation is not thread safe making it impossible to do this process in parallel. This means there is a 2 minute lag before the BitTorrent transfer begins.
To get around this issue we created a thread safe version of SHA1 and a DigestPool interface to allow for parallel hashing. This allows us to tune the lag time before the transfer begins, at the expense of increased CPU usage. It’s possible to hash the entire 9 GB in 16 seconds when running in parallel, making the lag to transfer around 32 seconds total.
To better deal with the transfer lag we are looking at creating a Torrent file per index segment. Lucene indexes are made up of various segments. Each commit creates an index segment. By creating a new Torrent file per segment we can reduce the lag before transfer to milliseconds, because new segments are generally small.
We are also going to be adding support for transfer of arbitrary files via replication. We use external file fields and custom index time stamp files for keeping track of incremental indexing. It makes sense to have Solr manage replication of these files. We will follow HTTP replication’s lead on confFiles, adding dataFiles and indexFiles to handle the rest of the index related files.
Our search infrastructure is mission critical at Etsy. Integrating BitTorrent into Solr allows us to scale search without adding lag, keeping our sellers happy!
Most product ideas are shitty, yet we spend the majority of our lives working on them.
As a product hacker, you’ll be working on a constant stream of ideas that excite you to the point of obsession; staying up late writing code, thinking about it every waking and non-waking minute. We’ve all admitted that a minority of our ideas will turn into something that will have the impact we dream of, but we don’t let that truth prevent us from being excited that this next thing might be the one. Some have admitted this and accepted that they’re a junky who’s only going to get that fix from a great feature once in a long while. Although I admit that I’m a junky, I haven’t yet become a fatalist.
Web Operations people speak about measuring their work by the Mean Time Between Failures (MTBF). For product hackers, we should be thinking in terms of minimizing Mean Time Between Wins (MTBW). Because it’s difficult to know which ideas are going to blossom into that great feature, a nice proxy for MTBW is Mean Time to Bad Idea Detection (MTTBID).
By building out an ecosystem for you and your team that allows bad ideas to be detected quickly, you can spend your time iterating on the great ideas and shipping your wins quickly while the shitty ideas die a meaningless death somewhere in a pile of other shitty ideas.
The best hackers I know are impatient. As soon as you get an exciting result, you’re going to be talking about it with whoever will listen. An ecosystem of tools that are just there providing a source of truth that everyone can understand and agree with is like having a posse of hardened thugs at your back at all times. Instead of excitement going sour when people who haven’t seen the light are doubting you, you can all agree on whats actually going on. If the numbers you care about are getting better, then great. If your product isn’t something that can be measured easily, or is a long term bet, you can show that the numbers you care about aren’t getting worse and show that its safe to push on into the wilderness.
Here are some things we’ve learned about how to build that ecosystem.
Make Tools for Failing Fast
Ideas can fail at any level of scrutiny. Some ideas don’t pan out when looked at under a microscope. Others don’t work out when talking about it over a drink. If it survives to the point of being shown to users, it can fail when you’re looking at it through a telescope and you’re just not seeing the response you hoped for. We spent some time trying to improve the quality and performance of our relevance sorting algorithm for search results before we made relevance-ordering the site-wide default. During the four month period where we did this work, we were able to get thirty experiments completed. Of those, eleven were real wins that made it into the final product.
At Etsy, the birth of every idea is the simplest possible implementation that permits experimentation. To give ourselves immediate feedback on the effects of search algorithm changes we created a tool that let us see the new ranking and all of the information we need to understand why a listing is ranked the way it is. The tool let us see this new ranking the moment our search server finished compiling, allowing for rapid iteration on tricky edge-cases, and the ability to quickly detect and kill bad components.
We created a tool that runs a sample of popular and long-tail queries through a new algorithm and displays as much information as can be determined without real people being involved; an estimated percent of changed search results over the universe of all queries, a list of the most strongly affected queries, a list of the most strongly affected Etsy shops, etc..
We created tooling for running side-by-side studies where real users were asked to rate which set of search results they preferred for a given query. When a feature was ready to be launched as an A-B test, we were able to see a set of visualizations explaining how our change was performing relative to the standard algorithm.
|What a Search AB Test Looks Like||What a site-wide AB test looks like|
The best part is that we don’t think about these tools while building new products and running experiments. We come up with ideas, implement them, and if they do well we ship them. Our conversations are about the product, the code we write is for the product and our shitty ideas are executed on the spot and sloppily buried in shallow graves, as they deserve and as is our wont.
Make Tools that Make Process Disappear
Edward Tufte introduced the concept of “chart junk”; the distracting stuff on a visualization that isn’t saying anything about the data. Marshall McLuhan made a compelling case that “The medium is the message” implying that the vehicle through which you perceive something impacts your understanding of it. Just because your paying clients won’t see your internal tooling doesn’t give you license to slap together an ill considered tool. The medium is the message, and your tools are your medium. Working Memory is limited and people are busy. Decisions are worse when getting the answer to a question about your product requires that you lose track of what you asked or why it’s important. Decisions are even worse if you never get a chance to ask questions and get answers. Products designed with fewer poor decisions are less shitty than products designed with more poor decisions. GNU wouldn’t exist without GDB.
|Our Non-Shitty Search Query Analysis Tool||Solr’s Shitty Query Analysis Tool|
It’s really important to our business that we return great results when people are doing searches on Etsy. It turns out we’re super lazy and if there are any barriers in the way of us asking “why is this item showing up for this query”, we’re just not going to ask the question and it’s not going to get fixed. Our query analysis tool (pictured on the left) helps reduce that barrier to getting an answer.
The best information about your product is going to come from real users. Unfortunately, its often painful to get your products out in to the real world. Having completed an iteration of a product, you’re filled with excitement and fear. You’re hoping you got it all right, but if you didn’t, you’re ready to fix it because you know every intimate detail of your new creation. This state of excitement and readiness is the last thing you want to let go of. Continuous deployment, the practice of pushing your code live the moment its ready, is absolutely essential for product hackers.
If you need to wait any non-trivial amount of time between completing something and seeing how well it’s performing, you’re not going to be working on that project by the time you get your answer. When you do get your answer, you’re not only going to have to refresh your memory on what you had been working on, but you’re going to have to do the same on whatever else you had started working on. Asking your team to work with patience and discipline has never worked and never will work. Build an ecosystem where doing the right thing is the easiest thing. Build an ecosystem where making great decisions is the easiest thing. Build an ecosystem where the lazy, excitable and impatient really shine.