Teaching Etsy to Speak a Second Language
By: JM Imbrescia & Corey Losenegger
As an Engineer, it can be quite daunting planning how to translate a website from English into another language. We just finished teaching the Etsy website to speak German, and wanted to share some tips.
When preparing to translate the site, there are a few challenges to think through:
- How to annotate English content that needs to be translated
- How to handle translation of English content into other languages
- How to efficiently serve translated content to the end user
In this post we’ll be focusing on how we solved #1 and #3, saving #2 for a separate post later. Also, note that user-generated content (e.g. shops & item listings) are translated by our sellers through a separate interface.
Although there many great translation solutions for platforms such as Rails and Java, there aren’t that many for PHP and Smarty, which we use at Etsy. We evaluated a variety of existing PHP solutions (SmartyMultiLanguage, Yahoo R3) as well as looking at how other sites (Twitter, Facebook) are translated. In the end, we decided to create our own translation infrastructure.
Here’s an overview of the workflow:
The key components are the Message Extractor, the Translation Bundles, and the Substitution functions, which we’ll discuss below. First, let’s talk about tagging…
The key to our translation workflow is message tagging, which is a way for developers to annotate English content for translation. Once messages are tagged, they can then be extracted for translation, and later substituted with translated content.
We use a custom message HTML tag to wrap all English phrases:
<p><msg desc="Footer link">Click <a href="/">here</a> to return to Etsy.</msg></p>
This msg tag is a cue to the Message Extractor that this is an English string which needs to be translated. It is also used later by the substitution functions, which swap in translated content into the template. The desc attribute is used to give human translator some context.
$translator->translateMsg(“You must enter your username”, “Error message”);
At Etsy, it tooks us 3 months to tag 1,200 templates containing 13,000 strings, plus 4,000 database strings (e.g. item categories). The responsibility for tagging all of these templates falls on all Engineers, Product Managers and Designers at Etsy that contribute to our codebase. The mandate is that all English strings must be tagged for translation, which took us some time to adapt to but now that we’ve launched in a few languages it’s easy for everyone to understand the importance of tagging for translation. Anyone who inadvertently forgets to tag a string gets a sticker surreptitiously attached to their laptop:
We see around 50-100 new English strings come in each day for our translators to translate. Tagging bite-sized phrases no larger than a few sentences has worked well for our translators. We’ll talk more about some tagging gotchas (e.g. plurals, possessives), as well as our translation workflow (and how it affects continuous deployment) in later blog posts.
The Message Extractor is a (PHP, like most of our tools) script which scans the codebase, extracts tagged messages, and stores them for translation. It is a dumb file parser which uses regexes to match messages based on the <msg> and translateMsg() “tags” above. We use the MD5 of content hashed with description as a way to track unique/changed messages. Messages with the exact same content and description will always be translated to the same translation.
Pre-commit hooks help developers by checking for invalid tags. A nightly cron runs the Message Extractor and writes messages to a database, and sends out an email like this that allows everyone to monitor translation status:
Translators use custom translation tools built in-house to add/edit translations.
Once our translators have translated English strings into the relevant languages, we then bundle (dump) these translations into JSON files to be deployed alongside our PHP & Smarty codebase using Deployinator. Use a static translation bundle file removes any dependency on databases, and allows for easy versioning, rollbacks, and allows us to test translations in our usual QA -> Princess -> Production deployment flow.
How does a Smarty template littered with tagged (<msg>) English strings get translations swapped in? Fortunately, Smarty provides pre- and post-filters to apply functions to template contents. We make use of a straightforward Smarty prefilter, which runs a regex against Smarty template contents, looking for <msg> tags. For each <msg> tag, it computes an MD5 hash (again, based on content and description), and then checks the translation bundle for a relevant translation to swap in. We use Smarty’s compilation functions to precompile all templates across all supported languages during deployment.
The function it uses to do this MD5 hashing & swapping is a PHP function called translateMsg(), which we mentioned above. This same translateMsg() function is also available throughout the PHP codebase to translate one-off messages that aren’t able to be moved into Smarty templates.
Translation is just one piece of the puzzle
There’s a great deal of other localization that needs to be handled—from language- and region-specific features, to little details such as date, currency, and number formatting. For these cases we usually crate custom Smarty modifiers and wrappers which take localization logic into account.
That’s our translation stack from top to bottom—please chime in with any questions or comments you’ve got. Stay tuned for additional posts about how we’ve internationalized Etsy.