Teaching Etsy to Speak a Second Language

Posted by on October 19, 2011

By: JM Imbrescia & Corey Losenegger

As an Engineer, it can be quite daunting planning how to translate a website from English into another language. We just finished teaching the Etsy website to speak German, and wanted to share some tips.

When preparing to translate the site, there are a few challenges to think through:

  1. How to annotate English content that needs to be translated
  2. How to handle translation of English content into other languages
  3. How to efficiently serve translated content to the end user

In this post we’ll be focusing on how we solved #1 and #3, saving #2 for a separate post later.  Also, note that user-generated content (e.g. shops & item listings) are translated by our sellers through a separate interface.

Although there many great translation solutions for platforms such as Rails and Java, there aren’t that many for PHP and Smarty, which we use at Etsy.  We evaluated a variety of existing PHP solutions (SmartyMultiLanguage, Yahoo R3) as well as looking at how other sites (Twitter, Facebook) are translated.  In the end, we decided to create our own translation infrastructure.

Here’s an overview of the workflow:

Etsy translation stack

The key components are the Message Extractor, the Translation Bundles, and the Substitution functions, which we’ll discuss below.  First, let’s talk about tagging…

Tagging

The key to our translation workflow is message tagging, which is a way for developers to annotate English content for translation.  Once messages are tagged, they can then be extracted for translation, and later substituted with translated content.

We use a custom message HTML tag to wrap all English phrases:

<p><msg desc="Footer link">Click <a href="/">here</a> to return to Etsy.</msg></p>

This msg tag is a cue to the Message Extractor that this is an English string which needs to be translated.  It is also used later by the substitution functions, which swap in translated content into the template.  The desc attribute is used to give human translator some context.

What about PHP, Javascript, and images?

We strive to keep all English content in the presentation layer (in Smarty templates).  In the case of English strings which live in databases (e.g. item category names), we create custom extraction methods.  For images, we strive to move stylized text into HTML & CSS (again, in Smarty).  For Javascript strings, we strive to keep text in Smarty templates (either as inline Javascript, or as hidden DOM nodes which can be accessed via Javascript).  For strings which live in PHP files (e.g. error messages shared by several Smarty templates), we have an additional “tag” which looks like this:

$translator->translateMsg(“You must enter your username”, “Error message”);

Tagging Etsy

At Etsy, it tooks us 3 months to tag 1,200 templates containing 13,000 strings, plus 4,000 database strings (e.g. item categories).  The responsibility for tagging all of these templates falls on all Engineers, Product Managers and Designers at Etsy that contribute to our codebase.  The mandate is that all English strings must be tagged for translation, which took us some time to adapt to but now that we’ve launched in a few languages it’s easy for everyone to understand the importance of tagging for translation.  Anyone who inadvertently forgets to tag a string gets a sticker surreptitiously attached to their laptop:

i-broke-translations

We see around 50-100 new English strings come in each day for our translators to translate.  Tagging bite-sized phrases no larger than a few sentences has worked well for our translators.  We’ll talk more about some tagging gotchas (e.g. plurals, possessives), as well as our translation workflow (and how it affects continuous deployment) in later blog posts.

Extraction

The Message Extractor is a (PHP, like most of our tools) script which scans the codebase, extracts tagged messages, and stores them for translation.  It is a dumb file parser which uses regexes to match messages based on the <msg> and translateMsg() “tags” above.  We use the MD5 of content hashed with description as a way to track unique/changed messages.  Messages with the exact same content and description will always be translated to the same translation.

Pre-commit hooks help developers by checking for invalid tags. A nightly cron runs the Message Extractor and writes messages to a database, and sends out an email like this that allows everyone to monitor translation status:

translation_status

Translators use custom translation tools built in-house to add/edit translations.

Bundling

Once our translators have translated English strings into the relevant languages, we then bundle (dump) these translations into JSON files to be deployed alongside our PHP & Smarty codebase using Deployinator.  Use a static translation bundle file removes any dependency on databases, and allows for easy versioning, rollbacks, and allows us to test translations in our usual QA -> Princess -> Production deployment flow.

Substitution

How does a Smarty template littered with tagged (<msg>) English strings get translations swapped in?  Fortunately, Smarty provides pre- and post-filters to apply functions to template contents.  We make use of a straightforward Smarty prefilter, which runs a regex against Smarty template contents, looking for <msg> tags.  For each <msg> tag, it computes an MD5 hash (again, based on content and description), and then checks the translation bundle for a relevant translation to swap in.  We use Smarty’s compilation functions to precompile all templates across all supported languages during deployment.

The function it uses to do this MD5 hashing & swapping is a PHP function called translateMsg(), which we mentioned above.  This same translateMsg() function is also available throughout the PHP codebase to translate one-off messages that aren’t able to be moved into Smarty templates.

Translation is just one piece of the puzzle

There’s a great deal of other localization that needs to be handled—from language- and region-specific features, to little details such as date, currency, and number formatting.  For these cases we usually crate custom Smarty modifiers and wrappers which take localization logic into account.

That’s our translation stack from top to bottom—please chime in with any questions or comments you’ve got.  Stay tuned for additional posts about how we’ve internationalized Etsy.

Posted by on October 19, 2011
Category: engineering, internationalization Tags: , ,

Related Posts

8 Comments

Did you find string sizing to be a major issue? i.e. A layout that depends on the string length becoming a bit too large for the space without dropping the font size significantly.

Good info. I would like to know more about how you guys are handling search in other languages.(How are compound words broken down into single words to provide accurate results, capitalization/lower case and such).

Also, what was the approach from the UI design perspective? and what changes were made to account for a larger than usual average word length? (Moving forward, will this same strategy support all other languages?

Most importantly, I’m very interested on learning about your thoughts on untranslated user-content. I was just looking at ‘Treasuries’ in the German version of the site and all user-defined tags are of course in English. Are you expecting users to get in the habit of translating these terms and tagging common areas such as the treasuries in multiple languages? or can this be handle with the search code somehow?

We did have to do a bit of re-css’ing to get everything to look correct. The most major change that we had to do was changing our form layout. If you view the site in english input labels are on the left of form items, and in German they sit on top. This gave us plenty more room to work with without sacrificing on font size.

Wallaroo you’ve got slew of good questions, and we’ve actually got full blog posts coming down the pipe for a couple of them. The way we handle international search (along with all the complexities you mention) will covered in the next couple weeks. Also, we we’re working on another post on how we are allowing multi-lingual users to translate content they create.

[…] use of Smarty modifiers to format and display prices based on EtsyLocale->getCurrency().  Our translation tools (specifically, translateMsg()) make use of EtsyLocale->getLanguage() to determine which […]

[…] Teaching Etsy to Speak a Second Language […]

Great article, as always :). We are currently using HTML_Translation2 from PEAR, but it’s too slow.

[…] we mentioned in Teaching Etsy to Speak a Second Language, developers need to tag English content so it can be extracted and then translated. Since we are a […]

[…] Memory As we mentioned in Teaching Etsy to Speak a Second Language, developers need to tag English content so it can be extracted and then translated. Since we are a […]