Etsy’s experiment with immutable documentation

Posted by on October 10, 2018

Introduction

Writing documentation is like trying to hit a moving target. The way a system works changes constantly, so as soon as you write a piece of documentation for it, it starts to get stale. And the systems that need docs the most are the ones being actively used and worked on, which are changing the fastest. So the most important docs go stale the fastest! 1

Etsy has been experimenting with a radical new approach: immutable documentation.

Woah, you just got finished talking about how documentation goes stale! So doesn’t that mean you have to update it all the time? How could you make documentation read-only?

How docs go stale

Let’s back up for a sec. When a bit of a documentation page becomes outdated or incorrect, it typically doesn’t invalidate the entire doc (unless the system itself is deprecated). It’s just a part of the doc with a code snippet, say, which is maybe using an outdated syntax for an API.

For example, we have a command-line tool called dbconnectthat lets us query the dev and prod databases from our VMs. Our internal wiki has a doc page that discusses various tools that we use to query the dbs. The part that discusses ‘dbconnect’ goes something like:

 

Querying the database via dbconnect ...

((section 1))
dbconnect is a script to connect to our databases and query them. [...]

((section 2))
The syntax is:

% dbconnect <shard>

 

Section 1 gives context about dbconnect and why it exists, and section 2 gives tactical details of how to use it.

Now say a switch is added so that dbconnect --dev <shard> queries the dev db, and dbconnect --prod <shard> queries the prod db. Section 2 above now needs to be updated, because it’s using outdated syntax for the dbconnect command. But the contextual description in section 1 is still completely valid. So this doc page is now technically stale as a whole because of section 2, but the narrative in section 1 is still very helpful!

In other words, the parts of the doc that’s most likely to go stale are the tactical, operational details of the system. How to use the system is constantly changing. But the narrative of why the system exists and the context around it is less likely to change quite so quickly.

 

How to use the system is constantly changing. But the narrative of why the system exists and the context around it is less likely to change quite so quickly.

 

Docs can be separated into how-docs and why-docs

Put another way: ‘code tells how, docs tell why’  2. Code is constantly changing, so the more code you put into your docs, the faster they’ll go stale. To codify this further, let’s use the term “how-doc” for operational details like code snippets, and “why-doc” for narrative, contextual descriptions  3. We can mitigate staleness by limiting the amount we mix the how-docs with the why-docs.

 

We can mitigate staleness by limiting the amount we mix the how-docs with the why-docs.

 

Documenting a command using Etsy’s FYI system

At Etsy we’ve developed a system for adding how-docs directly from Slack. It’s called “FYI”. The purpose of FYI is to make documenting tactical details — commands to run, syntax details, little helpful tidbits — as frictionless as possible.

 

FYI is a system for adding how-docs directly from Slack.

 

Here’s how we’d approach documenting dbconnect using FYIs 4:

Kaley was searching the wiki for how to connect to the dbs from her VM, to no avail. So she asks about it in a Slack channel:

hey @here anyone remember how to connect to the dbs in dev? I forget how. It’s something like dbconnect etsy_shard_001A but that’s not working

When she finds the answer, she adds an FYI using the ?fyi command (using our irccat integration in Slack 5):

?fyi connect to dbs with `dbconnect etsy_shard_000_A` (replace `000` with the shard number). `A` or `B` is the side

Jason sees Kaley add the FYI and mentions you can also use dbconnect to list the databases:

you can also do `dbconnect -l` to get a list of all DBs/shards/etc, and it works for dev-proxy on or off

Kaley then adds the :fyi: Slack reaction (reacji) to his comment to save it as an FYI:

you can also do `dbconnect -l` to get a list of all DBs/shards/etc, and it works for dev-proxy on or off

A few weeks later, Paul-Jean uses the FYI query command ?how to search for info on connecting to the databases, and finds Kaley’s FYI 6:

?how database connect

He then looks up FYIs mentioning dbconnect specifically to discover Jason’s follow-up comment:

?how dbconnect

But he notices that the dbconnect command has been changed since Jason’s FYI was added: there is now a switch to specify whether you want dev or prod databases. So he adds another FYI to supplement Jason’s:

?fyi to get a list of all DBs/shards/etc in dev, use `dbconnect --dev`, and to list prod DBs, use `dbconnect --prod` (default)

Now ?how dbconnect returns Paul-Jean’s FYI first, and Jason’s second:

?how dbconnect

FYIs trade completeness for freshness

Whenever you do a ?how query, matching FYIs are always returned most recent first. So you can always update how-docs for dbconnect by adding an FYI with the keyword “dbconnect” in it. This is crucial, because it means the freshest docs always rise to the top of search results.

FYIs are immutable, so Paul-Jean doesn’t have to worry about changing any FYIs created by Jason. He just adds them as he thinks of them, and the timestamps determine the priority of the results. How-docs change so quickly, it’s easier to just replace them than try to edit them. So they might as well be immutable.

 

How-docs change so quickly, it’s easier to just replace them than try to edit them. So they might as well be immutable.

 

Since every FYI has an explicit timestamp, it’s easy to gauge how current they are relative to API versions, OS updates, and other internal milestones. How-docs are inherently stale, so they might as well have a timestamp showing exactly how stale they are.

 

How-docs are inherently stale, so they might as well have a timestamp showing exactly how stale they are.

 

The tradeoff is that FYIs are just short snippets. There’s no room in an FYI to add much context. In other words, FYIs mitigate staleness by trading completeness for freshness.

 

FYIs mitigate staleness by trading completeness for freshness

 

Since FYIs lack context, there’s still a need for why-docs (eg a wiki page) about connecting to dev/prod dbs, which mentions the dbconnect  command along with other relevant resources. But if the how-docs are largely left in FYIs, those why-docs are less likely to go stale.

So FYIs allow us to decouple how-docs from why-docs. The tactical details are probably what you want in a hurry. The narrative around them is something you sit back and read on a wiki page.

 

FYIs allow us to decouple how-docs from why-docs

What FYIs are

To summarize, FYIs are:

What FYIs are NOT

Similarly, FYIs are NOT:

Conclusions

Etsy has recognized that technical documentation is a mixture of two distinct types: a narrative that explains why a system exists (“why-docs”), and operational details that describe how to use the system (“how-docs”). In trying to overcome the problem of staleness, the crucial observation is that how-docs typically change faster than why-docs do. Therefore the more how-docs are mixed in with why-docs in a doc page, the more likely the page is to go stale.

We’ve leveraged this observation by creating an entirely separate system to hold our how-docs. The FYI system simply allows us to save Slack messages to a persistent data store. When someone posts a useful bit of documentation in a Slack channel, we tag it with the :fyi: reacji to save it as a how-doc. We then search our how-docs directly from Slack using a bot command called ?how.

FYIs are immutable: to update them, we simply add another FYI that is more timely and correct. Since FYIs don’t need to contain narrative, they’re easy to add, and easy to update. The ?how command always returns more recent FYIs first, so fresher matches always have higher priority. In this way, the FYI system combats documentation staleness by trading completeness for freshness.

We believe the separation of operational details from contextual narrative is a useful idea that can be used for documenting all kinds of systems. We’d love to hear how you feel about it! And we’re excited to hear about what tooling you’ve built to make documentation better in your organization. Please get in touch and share what you’ve learned. Documentation is hard! Let’s make it better!

Acknowledgements

The FYI system was designed and implemented by Etsy’s FYI Working Group: Paul-Jean Letourneau, Brad Greenlee, Eleonora Zorzi, Rachel Hsiung, Keyur Govande, and Alec Malstrom. Special thanks to Mike Lang, Rafe Colburn, Sarah Marx, Doug Hudson, and Allison McKnight for their valuable feedback on this post.

References

  1. From “The Golden Rules of Code Documentation”: “It is almost impossible without an extreme amount of discipline, to keep external documentation in-sync with the actual code and/or API.”
  2. Derived from “code tells what, docs tell why” in this HackerNoon post.
  3. The similarity of the terms “how-doc” and “why-doc” to the term here-doc is intentional. For any given command, a here-doc is used to send data into the command in-place, how-docs are a way to document how to use the command, and why-docs are a description of why the command exists to begin with.
  4. You can replicate the FYI system with any method that allows you save Slack messages to a predefined, searchable location. So for example, one could simply install the Reacji Channeler bot, which lets you assign a Slack reacji of your choosing to cause the message to be copied to a given channel. So you could assign an “fyi” reacji to a new channel called “#fyi”, for example. Then to search your FYIs, you would simply go to the #fyi channel and search the messages there using the Slack search box.
  5. When the :fyi: reacji is added to a Slack message (or the ?fyi irccat command is used), an outgoing webhook sends a POST request to irccat.etsy.com with the message details. This triggers a PHP script to save the message text to a SQLite database, and sends an acknowledgement back to the Slack incoming webhook endpoint. The acknowledgement says “OK! Added your FYI”, so the user knows their FYI has been successfully added to the database.
  6. Searching FYIs using the ?how command uses the same architecture as for adding an FYI, except the PHP script queries the SQLite table, which supports full-text search via the FTS plugin.
Posted by on October 10, 2018
Category: engineering, infrastructure, philosophy

9 Comments

Thank you for sharing it!
It’s an interesting idea and the execution with Slack is smart as developers spend a lot of time inside it.
I hope to see some projects that will give this functionality as a service.

I love this idea! I’ve been struggling with where to keep my own personal knowledgebase that would involve zero friction, for snippets like “how to do colored diff output in bash”. I’m going to try setting this up. Thanks so much for sharing this.

Purpose + procedure. Perfection! 🙂 I’d also like to see conciseness implemented as a standard–kind of like snippets. Great approach.

So I’ve been playing around with an implementation of the fyi slackbot for the past few days and I _think_ the search feature in Slack does this nicely already. If I make a new reactji :FYI: I can now add that to a slack search, like: “has:reaction[:FYI:] dbconnect …”.

I think there could be a benefit to using a bot or slash commands, but I’m having a hard time putting a finger on it. The Slack search already sorts by relevance/time and has a pretty good interface now.

    Hi Zach, yes yours is a great approach too! And it has the benefit of using Slack’s excellent native search. The downsides I see are that you have to use the ” has:fyi” syntax when searching, which is a tad more friction than “?how ” (but granted not by much), and that your FYIs would be deleted after your configured Slack retention period (as for all Slack other messages).

Thanks for the useful idea, I tried to replicate it, so far all good, except adding outgoing webhook(as it is deprecated) for reacji, and Slack suggest to use Events API which doesn’t work for this idea. How did you make reacji work without outgoing webhook. For normal messages all works as expected. Thanks again for sharing.

    Hi Berdikhan, we use the outgoing webhook with our irccat integration, which also provides a reacji trigger.