Browsemap: Collaborative Filtering At LinkedIn

Lili Wu

October 23, 2014

LinkedIn makes extensive use of item-based collaborative filtering, which showcases relationships between pairs of items based on the wisdom of the crowd. As an example, each member's profile on LinkedIn has a "People Who Viewed This Profile Also Viewed" recommendation module. Known as a profile browsemap, this module is a navigational aid displaying other profiles that are frequently co-viewed together with the current profile. Similarly, job posting page has a job browsemap ("People Who Viewed This Job Also Viewed") that showcases related jobs discovered by other people. In fact, collaborative filtering datasets, or browsemaps, exist for many entity types on LinkedIn such as member, job, company, and group. These navigational aids are principal components of engagement on the site.

People Who Viewed This Profile Also Viewed

People Who Viewed This Job Also Viewed

All of the browsemaps are powered by a horizontal collaborative filtering platform called Browsemap. We recently presented a paper on Browsemap at the 6th Workshop on Recommender System and the Social Web (RSWeb), collocated with ACM RecSys.

The Browsemap: Collaborative Filtering at LinkedIn

The Browsemap platform is a hybrid offline/online system, as illustrated by the architecture diagram below. The offline system uses Hadoop for its batch computation engine because of its high throughput, fault tolerance, and horizontal scalability. Computed browsemaps are bulk loaded into a distributed key-value store, which provides low-latency queries.

Browsemap Architecture

Browsemap is a generic platform with mostly shared components such as counting co-views, and some vertical-specific logic such as filtering out expired jobs due to the ephemeral nature of job postings. To support all of the various entity types, we developed an in-house domain-specific language (Browsemap DSL) that describes how to build these browsemaps. The platform enables rapid development, deployment, and computation of collaborative filtering recommendations for almost any use case on LinkedIn. In addition, it provides centralized management of scaling, monitoring, and other operational tasks for online serving. With this online/offline architecture, Browsemap serves LinkedIn's traffic well -- it can process hundreds of millions of entities and serve billions of monthly page views.

Besides the navigational aids described above, the availability of these collaborative filtering datasets also enables many hybrid recommender systems. Examples include recommending similar profiles, suggesting companies you may want to follow, predicting where a member resides, and some others. The datasets surface the implicit connection between entities that is driven by members’ preference, and not available by studying the content alone. Altogether, Browsemap powers about a dozen recommendation products on LinkedIn.

We learned some valuable lessons while running this system in production in the last few years. One lesson is "tall oaks grow from little acorns". With the expansion of data and content on LinkedIn, there is an ever increasing need for recommendation products to help our members discover new content. As a generic horizontal recommender system, Browsemap enables fast product development. A developer can quickly bootstrap a new browsemap and put it into production, typically in just a day or two. Usually the developer’s time is mostly spent to understand the nature of the product, preprocess input data, and implement any vertical-specific requirements. In fact, Browsemaps are frequently used as the first recommendation product for any new entity on LinkedIn.

If you're interested in more detail about how Browsemap works, the different applications it powers, and the rest of the lessons learned, please see the full paper: Browsemap: Collaborative Filtering At LinkedIn

Browsemap: Collaborative Filtering At LinkedIn

Lili Wu

October 23, 2014

Topics

Announcing the Voldemort 1.6.0 Open Source Release

Theory vs. Practice: Learnings from a recent Hadoop incident