Etsy Icon>

Code as Craft

Improving Etsy for iOS with server-based logging main image

Improving Etsy for iOS with server-based logging

  image

One of the major challenges the Native Apps group faces at Etsy is maintaining and debugging a large codebase with a small team. Android and iOS are less mature platforms at Etsy and correspondingly lack some of the development tools and safety net that our fellow engineers have built up on the web side. Part of what the apps team does is build our own platform-specific tools. This is a case study for how we applied Etsy’s log-everything ethos to a particular class of bugs in Etsy for iOS.

Collection errors in Cocoa Touch

Crashing is one of the worst things an app can do. Cocoa has a number of patterns that encourage silent failures over crashing, most notably nil-messaging. But there are some parts of Cocoa that have fail-fast behavior instead. The most common of these cases involve the collection classes, which throw exceptions for cases like array out-of-bound errors or setting a nil keys in a dictionary. Server hiccups can cause problems when we use collections if we’re not careful, since we interpret empty fields returned from the Etsy API as nil objects. These issues were especially common when we were first developing our app, as the server API was constantly changing, but they can also be caused by all sorts of other server errors.

We could solve this problem by wrapping every array or dictionary access in an if statement to check for these cases, but that would be so much more verbose it seems likely to cause more bugs than it catches. Instead, we abstracted this pattern out into a series of wrapper methods and added them as a Safe category on each collection class, such as -[NSArray(Safe) safeObjectAtIndex:] and -[NSMutableDictionary(Safe) safeSetObject:forKey:]. These methods handle those exceptional cases, either taking no action or returning nil as appropriate. This way, if we have unexpected missing data, we might do something like display an empty label, but at least the app remains stable. Users can refresh or go back to a screen with good data.

Over time, we started using these methods all over the code base. They supported several convenient patterns, such as lazily loading cached objects from an array:

for (SomeClass * object in myArray) {
    NSUInteger index = [myArray indexOfObject:object];
    //Returns nil when out-of-bounds
    UIView * view = [cachedViews safeObjectAtIndex:index];
    if (!view) {
        view = //...

However, when Apple added subscripted collection indexing in Clang 3.4, we moved away from the Safe methods wherever it looked okay. We wanted our code to be more idiomatic as well as aesthetically pleasing; array[0] is much easier to scan than [array safeObjectAtIndex:0] and more consistent with library code. What we didn’t realize is that our liberal use of these Safe categories had been hiding logic bugs: benign-looking changes started causing app instability. For example, we discovered an NSNumberFormatter that had always returned nil due to bad input. When we switched to subscripting syntax, that minor failure turned into a serious crash.

Safer refactoring with Assertions and Expectations

In a subsequent postmortem, we realized that the intent of our code needed to be explicit. It was impossible to tell which Safe calls were necessary, which were simply convenient, and which were hiding bugs. Removing the category entirely wasn’t possible, since that would expose all the crashes we intended to fix in the first place.

Ultimately we came up with a multi-part plan to rationalize our use of these methods:

  • Split our Safe methods into two variants. These variants have identical user visible behavior, but different intended semantics:
    • A variant where we explicitly expect the out-of-bounds case and handle it by returning nil, ignoring the inputs, or whatever is necessary in that case.
    • A variant where we don’t expect the out-of-bounds case and seeing it means something has gone wrong.
  • Log the cases where something is wrong and upload them to our server-based logging system.
  • Use the aggregated logs to identify spots in our code base where we’re catching potentially unexpected behavior.

The first part is simple. We added a set of methods, such as -[NSArray(Safe) objectOrNilAtIndex:], that explicitly indicate that nil is an acceptable value. This is useful for caching patterns and setting or getting optional values:

- (id) objectOrNilAtIndex:(int)index {
    if (index >= [self count])
        return nil;
    return [self objectAtIndex:index];
}

We kept our original “Safe” methods as the second variant, but modified to do the necessary logging. For that logging, enter our new assertion macros: EtsyAssert and EtsyExpect. The Etsy logging system has two main logging levels, ERROR and INFO, to separate critical error logs (that should be immediately investigated) from information (that we just want archived).  EtsyAssert corresponds to ERROR, while EtsyExpect corresponds to INFO.

//EtsyAssert is an alias for NSAssert in debug builds.
//In release builds, failures will log an error with a stack trace to the server.
#ifdef NS_BLOCK_ASSERTIONS
    #define EtsyAssert(condition, description, ...)
        if (!(condition)) {
            SERVER_LOG_ERROR_WITH_TRACE(@"EtsyAssert", description, ##__VA_ARGS__);
        }
#else
    #define EtsyAssert(condition, description, ...)
        NSAssert(condition, (description), ##__VA_ARGS__)
#endif
//EtsyExpect will complain if an expectation fails, but won't ever throw an exception.
#define EtsyExpect(condition, description, ...)
    if (!(condition)) {
        SERVER_LOG_INFO(@"EtsyExpect", description, ##__VA_ARGS__);
    }
#endif

EtsyAssert is a wrapper for NSAssert. It evaluates an expression and, when assertions are enabled, throws an exception if that expression evaluates to false. When assertions are disabled (typical in Release builds), it logs an error-level message. EtsyExpect is a little different: it indicates that we expect the code to work a particular way, and if it doesn’t, an error may (or may not) have occurred. We added EtsyExpect clauses to all of our old Safe methods. Since we were dealing with a potentially large number of existing minor issues, we used EtsyExpect so as not to interrupt regular developer workflows.

Let’s take a look at EtsyExpect in action:

- (id)safeObjectAtIndex:(int)index {
    EtsyExpect(index < [self count],
      @"Retrieving object at index %i from array of length %i", index, [self count]);
    return [self objectOrNilAtIndex:index];
}

Server-based logging for mobile clients

What’s makes EtsyExpect distinctly useful from the built-in Cocoa logging operations is the server-side logging. SERVER_LOG_INFO and SERVER_LOG_ERROR are macro wrappers around our logging framework:

#define SERVER_LOG_ERROR(ns, description, ...) 
    [EtsyAnalytics logError:[NSString stringWithFormat:description, ##__VA_ARGS__] 
                  namespace:ns 
                        line:__LINE__ 
                    function:__PRETTY_FUNCTION__ 
                  attributes:nil]

The logError:namespace:line:function:attributes method piggybacks on our existing native analytics framework. The EtsyAnalytics class sends zipped analytics data and log messages to an API server. Analytics data is ultimately used by our Big Data stack, but these log messages are printed directly to a file. From there they can be indexed, grepped, splunked, or watched in (almost) real time, like any other log. These logs look like:

api02 : [Fri Aug 09 21:25:43 2013] [iPhone 5 6.1.2/2.7 rv:2.7.0] [redacted UDID] [EtsyExpect] [-[NSArray(Safe) safeObjectAtIndex:]+401] Retrieving object at index 5 from array of length 5 (called by 1 Etsy 0x0029dcc1 Etsy + 2600129)

What makes this powerful is the ability to see issues quickly. In real life conditions, users hit edge cases that are rare or difficult to test. By aggregating errors sampled from 1% of app users, we can see where EtsyExpect and EtsyAssert fail. That increased visibility allows us to pick off, one by one, the bugs that our Safe methods were hiding. By leveraging this data, we can be sure of which cases are safe to change.

As an example: the log message above was a small logic error in the UIViewController subclass we use to display a listing. When the user swiped through a listing’s images, we preloaded the next image – without checking whether or not more images were available. App users were doing this thousands of times a day, but developers may have only encountered the issue a handful of times (if at all). While the bug was ultimately harmless, logging made it visible, so we were able to clean up this code with very little risk.

The future of logging

Moving a few logs out of the Xcode console and onto the Etsy servers turns out to be a very powerful tool. We’re still exploring what else we can do with it: for example, it’s also been useful for diagnosing invalid HTTP requests sent by the app. App logs can be correlated with server logs, so we can track certain errors from the client back to the server. Even watching the logs in splunk helps us see what bugs we’ve fixed in the last version, and what we may have caused. Ultimately, by making Etsy for iOS a little bit like our web stack, we’ve found a way to make it more robust.

By the way, we’re hiring!