Nagios, Sleep Data, and You
Ian Malpass once commented that “[i]f Engineering at Etsy has a religion, it’s the Church of Graphs.” And I believe! Before I lay me down to sleep during an on-call shift, I say a little prayer that should something break, there’s a graph somewhere I can reference. Lately, a few of us in Operations have begun tracking our sleep data via Jawbone UPs. After a few months of this we got to wondering how this information could be useful, in the context of Operations. Sleep is important. And being on call can lead to interrupted sleep. Even worse, after being woken up, the amount of time it takes to return to sleep varies by person and situation. So, we thought, “why not graph the effect of being on call against our sleep data?”
Gathering and Visualizing Data
We already visualize code deploys against the myriad graphs we generate, to lend context to whatever we’re measuring. We use Nagios to alert us to system and service issues. Since Nagios writes consistent entries to a log file, it was a simple matter to write a Logster parser to ship metrics to Graphite when a host or service event pages out to an operations engineer. Those data points can then be displayed as “deploy lines” against our sleep data.
For the sleep data we used, and extended, Aaron Parecki’s ‘jawbone-up‘ gem to gather sleep data (summary and detail information) via Jon Cowie’s handy ‘jawboneup_to_graphite‘ script on a daily basis. Those data are then displayed on personal dashboards (using Etsy’s Dashboard project).
So far, we’ve only just begun to collect and display this information. As we learn more, we’ll be certain to share our findings. In the meantime, here are examples from recent on-call shifts.
NOTE: Jawbone recently opened up their API. Join the party and help build awesome apps and tooling around this device!