LinkedIn Translation: What's in a Name?

Georg Puchta

December 7, 2011

LinkedIn recently launched Japanese as a new interface language; before that, we had also launched Russian, Romanian, Turkish, Italian, Portuguese, and many others. As a business-oriented social network, we have a high bar for translation quality. As it turns out, keeping this bar high can be as much of a technical challenge as a linguistic one.

In this post, I'll tell you about some of the subtleties with handling names across many languages and some of the solutions we've developed at LinkedIn.

What's in a name?

Displaying a name correctly can be surprisingly complicated. For example, in English, names are formatted as:

<first-name> <last-name>

Japanese, on the other hand, uses the opposite order:

<last-name> <first-name>

Other languages, such as Korean, do not use a space between the two name elements:

<last-name><first-name>

Complicating this even further, the maiden name (if present) is added in different ways depending on language. For example, in German, the maiden name is appended as:

<first-name> <last-name> geb. <maiden-name>

In English and several other languages, the maiden name is appended in parentheses:

<first-name> (<maiden-name>) <last-name>

What about context?

Name formatting can also vary depending on the context:

Full Name: the full name format concatenates all the name elements according to the rules described above. Maiden name is optional. This format is used in most formal or official contexts, such as a profile page.
Familiar Name: this format is usually used when formatting names of a member's connections. In English it is common to display only the first name. In German and CJK (Chinese, Japanese, Korean), it is custom to always show the first and last name.
List Name: this format is used in lists such as the address book. Non-CJK names are displayed as <last-name>, <first-name>. CJK names omit the comma: <last-name> <first-name>.

Which format do we pick?

Finally, we have to consider that multiple names can appear on the same page. For example, when looking at 10 names on a search results page, which format do we use for each name?

Name formatting

Should we arrange first and last names depending on interface language only? This approach wouldn't be acceptable for most Asian languages: an English viewer looking at a profile created in Japanese would see the first and last name in the wrong order.

Should we use a member's profile locale in conjunction with the interface language to determine how to format the name? This would display all names correctly, but then, for every single name on every single page, we'd have to fetch extra profile data. This would require an enormous amount of engineering work and dramatically increase the load on our backend servers.

The Solution

To pick the appropriate format, we came up with a solution that accommodates all use cases without requiring backend data: we use the character set of the name itself. If the character set is different than the profile language, we use it as an override and display the name using the format appropriate for the character set.

For example, an English viewer looking at a profile created in Japanese would get the Japanese name formatting; an English viewer looking at an English or German profile would see the English name formatting; and finally, a German viewer looking at an English or German profile would see the German formatting.

From a code perspective, the name assembly is handled by a NameFormatter class, which takes the name elements as arguments. In addition, the user can specify which name style to generate depending on the context:

The NameFormatter looks at the character set of the name elements and uses Unicode character ranges to identify the language and ordering to use. For example, the Unicode character range for Korean is uAC00 .. uD7AF and the character range for Japanese is u4E00 .. u9FBF (Kanji), u30A0 .. u30FF (Katakana), uFF61 .. uFF9F (Katakana: Half Width), or u3040 .. u309F (Hiragana).

Putting it all together

To see how all these variations play together, the following two tables show all the variations: different interface locales, different display styles, and different character sets.

Interface Language: English

Character set Full Name Familiar Name List Name

English	Brian (Madeup) Geffon	Brian	Geffon, Brian
Japanese	山口エース (利枝)	山口エース	山口エース
Korean	임혜영 (산	임혜영	임혜영
Chinese	小米如兰 (絴)	小米如兰	小米如兰
Russian	Денис (Петрович) Баранов	Денис	Баранов, Денис

Interface Language: German

Character Set Full Name Familiar Name List Name

English	Brian Geffon geb. Madeup	Brian Geffon	Geffon, Brian
Japanese	山口エース (利枝)	山口エース	山口エース
Korean	임혜영 (산)	임혜영	임혜영
Chinese	小米如兰 (絴)	小米如兰	小米如兰
Russian	Денис Баранов geb. Петрович	Денис Баранов	Баранов, Денис

What's Next?

Names are just one of the many challenges with translations. In an upcoming blog post, I'll discuss another one: dynamic sentence construction. I'll go over how to assemble sentences that include dynamic arguments - such as formatted names, numbers, and links - that still make sense despite the very different word ordering and sentence constructions used across a variety of languages.

Topics

Related story
Decoupling translation from source code