A Look Behind the AI that Powers LinkedIn’s Feed: Sifting through Billions of Conversations to Create Personalized News Feeds for Hundreds of Millions of Members
March 29, 2018
At LinkedIn, our mission is to connect the world’s professionals to make them more productive and successful. The LinkedIn Feed stands at the center of this global professional community: a place for our members to discover and join the conversations that are happening among their connections, taking place within their groups, and ignited by the Influencers and companies they’re following. Our members post ideas, career news, questions, jobs, and suggestions in an array of formats, including video, photo, short text, and long-form articles. Every one of these starts a conversation.
Every time a member visits LinkedIn, a machine learning algorithm starts by identifying the best conversations for her. In a fraction of a second, the algorithm sorts tens of thousands of posts and ranks the most relevant at the top of the feed.
Flowing into this algorithm are thousands of signals that help us understand a member’s preferences and enable us to personalize the feed for a specific member. These signals fall into three broad categories:
Identity: Who are you? Where do you work? What are your skills? Who are you connected with?
Content: How many times was the update viewed? How many times was it “liked”? What is the update about? How old is it? What language is it written in? What companies, people, or topics are mentioned in the update?
Behavior: What have you liked and shared in the past? Who do you interact with most frequently? Where do you spend the most time in your news feed?
By inputting these signals into our algorithms, we are able to generate personalized news feeds for every member and ensure they are having the conversations they need to become more productive and successful.
LinkedIn’s Feed AI: Objectives and insights
Our mission on the Feed AI team is to help LinkedIn’s members discover the most relevant conversations that will aid them in becoming more productive and successful. That mission is fairly easily understood by any person that reads it, but our AI requires a bit more guidance to break down the problem. So what is it, exactly, that we program the Feed AI to learn?
In a nutshell, we ask it to understand the value that the feed is providing for every single member. On LinkedIn, we have professionals from various walks of life and at different stages in their careers, all operating within the LinkedIn Knowledge Graph, a mapping of billions of possible connections between entities such as companies, members, and skills. Therefore, we have created a multifaceted ecosystem in the feed to cater to this diversity. This ecosystem includes products to help you find your next opportunity, connect you with brands and products, or suggest courses you might take to improve your skills. We also invest in follower relevance to ensure you are plugged into professionally-relevant conversations by following the right topics, people, and companies.
Traditionally, AI algorithms will learn which of these product offerings you care most about by measuring your click through rate (CTR), or the number of times you clicked on one of these offerings. However, an objective like CTR is very simplistic when considering a product like Jobs You May Be Interested In, where we care more about whether a member successfully applied for a particular job than if he simply clicked on the listing. Similarly, clicking on a trending news article by itself is not a good indication of whether or not the member found the article valuable, so we use a variety of signals for each use case to train our algorithms. These range from well-known metrics, like time spent reading, to insights from your social graph. We also incorporate a variety of findings from other sources in our models, such as user experience research.
In order for our model to capture these diverse use cases, we create machine learning algorithms for many different objectives and combine them together to personalize the feed. By having our AI learn different aspects of the problem, we can accurately capture the value our members derive from their feeds—whether it be trending news, jobs, courses, or updates from their connections. These algorithms are always changing as we learn new or promising models that can be applied to feed personalization.
What techniques do we use?
In addition to industry-standard techniques, like logistic regression, gradient-boosted decision trees, and neural networks, we have created some of our own extensions and frameworks. Some techniques, like boosted decision tables, have been made publically available via conferences like KDD, but we are working on many more novel modeling techniques that will be published on this blog and at conferences over the coming year.
One of the key components to our success has been the development of the Feed AI Platform, which allows us to run AI experiments very quickly. By automating the process by which we train and deploy our machine-learned statistical models, we are able to run thousands of experiments every year and deploy our models to production continuously.
What’s next for the LinkedIn Feed?
While the way we classify and recommend text is always changing, there are several upcoming projects which we are particularly excited to be bringing to our members’ feeds:
Building an "AI for AI”: As mentioned earlier, using the Feed AI Platform we already run experiments and A/B tests at scale on LinkedIn for everything from evaluating new product features to improving site speed. We are now looking at building a machine learning platform that can autonomously run experiments, train models, and highlight promising new models to our engineers for further investigation—essentially, automating part of the process of building the AI systems themselves. This will assist our Relevance team experts by handling 50-60% of their existing workload, so that they can do more interesting tasks while these experiments run autonomously in the background.
Per-member models: Depending on the use case in question, a single machine-learned model is typically applied to the feeds of certain cohorts of members, or to all members’ feeds. We are working on creating per-member models when we have the data, meaning we train a machine-learned model for each individual member as part of our continued work to expand the use of generalized linear mixed models (GLMix).
Interest Graph: We already have a strong set of explicit and implicit signals that provide context on what content a member may find interesting based on their social connections and the Knowledge Graph (e.g., a company that they follow or news widely shared within their company). To augment these existing signals, we are creating an interest graph that represents relationships between members and a taxonomy of topics. This graph allows us to measure member-to-topic affinity (e.g. how interested are you about scuba diving?), topic-to-topic relatedness (e.g. snorkling is related to scuba diving), as well as which of your connections share your interests.
Creator-side optimization: Many recommendation systems focus on optimizing for discreet, short-term gains in member activities (click, like, share) as a proxy for member engagement and value. In the future, we're investing in models that specifically optimize for members who create high-quality content on LinkedIn over time.
It takes a lot of partners to build and serve the LinkedIn Feed at such massive scale, so we’d like to thank the Feed Data Platform team (Parin Shah, Hassan Khan, and Ali Mohamed), the Ranking and Federation Infrastructure team (Joel Young and Fei Chen), the Content Relevance team (Shakti Sinha, Alex Patry, and Ankan Saha), the Content Understanding team (Mary Hearne and Juan Bottaro), the Content Quality team (Rushi Bhatt and Alpan Raval), the Editorial Team (Daniel Roth), and our partners in Feed Consumer Engineering and Product (Aarthi Jayaram, Prachi Gupta, Vivek Tripathi, Linda Leung, Heidi Wang, Zack Hendlin, and Peter Roybal).