Decoupling translation from source code

Ursula Huang

February 28, 2013

Internationalization infrastructure critically impacts the operation and development of LinkedIn, which is now available in 19 different languages. To accelerate iteration speed when working with localized text, the International Engineering team implemented a system to dynamically deploy translated content strings to services in production.

The new system allows us to rapidly modify content, making quick corrections and changes without involving developers or the deployment team, and without rebuilding or redeploying a service.

Internationalization primer

All content is initially written in English by our developers and product managers. Typically, text that needs translation goes into a properties file:

https://gist.github.com/brikis98/5053721.js

Our in-house localization team then translates the content into different languages. For example, here's the properties file above translated into Italian:

https://gist.github.com/brikis98/5053874.js

To get the text to show up on a page, we use i18n functions in our templates that look up the given key in the properties file for the user's locale:

https://gist.github.com/brikis98/5053964.js

Deploying the old way

Prior to the dynamic language loading project, the build system bundled all the properties files into the same artifact as the application code (a WAR). This led to a few problems:

Adding translations meant rebuilding and redeploying the entire service.
If there was a bug in a translation, you couldn't just rollback the text, but would have to roll back the application code to an older version as well.
Translations may be shared by multiple services, compounding the number of services to be rebuilt and redeployed.

Deploying the new way

The new system builds and deploys the properties files separately from application code. We introduced the concept of a “language pack,” which is a JAR file containing all of the translated content for a particular language. Updated versions of these language packs can be deployed to web servers at any time. They can also be rolled back at any time if there are bugs.

We added a new resource loading library that detects the availability of new language packs and starts using the updated translations, all without redeploying the service. Any time the resource loading library cannot find a translation, it falls back to the English string.

Translation Workflow

Deploying new translations is only part of the picture: we also had to build a way to efficiently find new or modified strings, deliver them to translators, and package up the result. Here's what the full workflow looks like:

An engineer checks in a new or updated English string into source control.
A translation server scans the source control system once a day for changes and sends all new or changed strings out for translation.
The translation server also picks up completed translations once an hour. It validates the new content, and then publishes a full language pack containing all of the translations for a particular language to Artifactory, LinkedIn’s repository manager.
The LinkedIn deployment system pushes updated language packs to staging once an hour for localization testing, and to production twice a day.
If the localization team needs to change a translation in a “hot fix” mode, we can also push translations manually at any time with the click of a button.

Backwards Compatibility

Since the dynamic language pack system decouples translated text from code, it's now possible for new strings to show up in production before the application code that uses them. Therefore, we now require that all internationalization resources are backwards compatible with previous versions to ensure nothing breaks as the new strings get deployed.

In this context, backwards compatibility means that it's always safe to add totally new strings, but if you modify an existing one, you must maintain the same number and type of placeholders. For example, let's say we originally had the following string:

https://gist.github.com/brikis98/5054157.js

Changing some of the wording is safe:

https://gist.github.com/brikis98/5054169.js

However, removing, adding, or changing the type of placeholders is not backwards compatible, since the application code will only be providing values for the old placeholders:

https://gist.github.com/brikis98/5054174.js

We enforce backwards compatibility with a pre-commit hook that prevents resources from being deleted, and also checks that updates to an existing text resource are consistent with regard to placeholders. The code snippet below shows some of our validation logic:

https://gist.github.com/anonymous/5035518.js

19 languages down....lots more to go!

The new system has streamlined the translation process at LinkedIn: getting new strings into production is significantly faster and easier and we now have the ability to gradually roll out translations and roll them back if anything is broken. Most importantly, our internationalization infrastructure can now scale to an ever growing number of applications, languages, and members.

Check out What's in a name? to learn more about LinkedIn internationalization. We also invite you to learn about our other infrastructure projects, such as The Play Framework, Databus, and Rest.li.