Bringing Storylines to Your Feed
March 22, 2017
One of the responsibilities of the LinkedIn Feed team is to deliver news and information in the most efficient way possible for today’s professionals. Tackling this challenge for LinkedIn’s more than 467 million members in a noisy, information-saturated environment poses unique challenges in search, scale, and relevance science.
Today we’re introducing Trending Storylines as part of the new feed experience at LinkedIn. Trending Storylines is a feature that helps members discover and discuss news, ideas, and diverse perspectives from the largest group of professionals, publishers, and editorial voices ever assembled. In this blog post, we’ll discuss our hybrid human-machine approach to creating a personalized news experience for our members.
Creating and updating storylines
Our main challenge is to create professional storylines and then maintain them with new content as our members react to them. While humans are good at curating high quality storylines, it would require them a significant amount of effort to keep them up-to-date with everything that our members are saying about it and the reverse is true for algorithmic approaches. To solve this problem, we thus decided to start with a hybrid approach, leveraging our editorial team to create new Trending Storylines, initialize them with the best new articles already available, and then add more information about it in our system so that our algorithms can subsequently keep the Storylines up-to-date automatically.
We thus needed a system that would:
Make it easier for our editors to to find the best possible new content available on LinkedIn and then quickly assemble it into collections of related articles (our Trending Storylines);
Use our feed, relevance and search systems to enrich those storylines with new content automatically and in real-time as our members react to them;
Rank the results to produce a Storyline that is relevant, fresh, professional, and personalized
Assembling a Storyline from LinkedIn Content
There are many ways that news stories can be aggregated. Throughout history, most of these have relied on manual labor. Traditional forms of aggregating news range from websites and email newsletters to podcasts--in most cases, these formats provide very high quality content but also deliver low levels of personalization for an end user. Even contemporary forums like Reddit and Hacker News rely on the free time of thousands of users to crowdsource these news curation activities. Our purpose here is to scale that operation, using our systems combined with the expertise of the LinkedIn editorial team to create relevant news recommendations for our entire population of 467 million members.
Our editorial team uses the following workflow when creating Storylines:
- Add a new entry for the storyline in our Storylines database. Specify a title, an image and a description.
- Search for relevant LinkedIn content and tag it into the storyline.
- Fill-in additional information that will allow our algorithms to update the storyline in real time as new articles and user generated content are published.
We implemented tooling to support this workflow in Bowtie, the UI through which our editors curate content:
When creating a new Storyline, its information is collected in an Espresso store. This store contains the Storyline name, image, and description alongside with other metadata that our algorithms use to automatically stay on top of new developments as the Storyline narrative unfolds, without additional input from our editors.
There is a lot of news created and shared on LinkedIn every day, which can make searching for fresh and relevant content a tedious task. Luckily for our team, we were able to leverage and enhance our existing Content Search infrastructure to power the Trending Storyline creation workflow. On top of leveraging our existing systems, we are exploring different ways to navigate the data and have already built new experimental features to filter results in greater detail--enhancing our search capabilities further .
When our editors find relevant content, they tag it as part of the Storyline. These tags are stored in another Espresso store and are propagated automatically to our different search indices using Databus. Finally, after all this curation, the storylines are made available to our members.
Serving a Trending Storyline to our Members
Storylines are built on top of our search infrastructures. So it is no surprise that rendering a storyline is very similar to rendering a Content Search query:
When the phone or the web application wants to display a Storyline it first sends a request to Voyager API (step 1) which forwards it to our Search Federator (step 2). This federator serves two main purposes as far as Storylines are concerned: routing the Storyline query to Content Search (step 3) and emitting the tracking events needed to collect metrics and training data for our machine learning ranking models.
When materializing a Storyline, Content Mixer must first fetch its metadata to create a structured query in a Domain Specific Language understood by all content providers. It will then send this query to all the content providers (step 4) in parallel. Each content provider adapts the query to its infrastructure to return their best results.
These results are returned back to the Content Mixer (step 5) which will blend them together in a unified list Which is then sent back to the client through the Search Federator (step 6) and Voyager API (step 8) with the storyline header and image.
We onboarded three content providers for this first release:
An article provider powered by a Galene index to return trending articles and articles published on our blogging platform.
An activities provider powered by another Galene index to return shares and status updates.
A “must read” provider maintaining a local cache of content that our editorial team marked as important for the Storyline. The system is built to make it easy to onboard new content providers as we iterate over the product.
What makes a good story?
Intuitively, it is easy for most members to tell the difference between very good and very bad results in a news recommendation system such as this one. However, there are many nuances in news articles that are hard to capture with a simple “good” or “bad” label--news article relevance can sometimes be ambiguous, even employing professional content curators. For instance, one of the challenges of serving storylines to a professional audience is that interest in news can be related to cultural trends or current events that affect an industry. Additionally, many of our members already get their news from user-generated content in their feed, posts from media organizations to LinkedIn, and many other sources--relevance for them means that they are seeing fresh news stories.
After many discussions and reflection, and evaluations by early testers of the Storylines feature, we distilled the quality of a result into these four attributes:
Storyline Relevance: Is this result relevant to the storyline? Can it provide additional, useful context about an ongoing news story?
Freshness: Is this result fresh or old? We want our content to be new, novel, and deliver up-to-date insights for our members.
Professionalism: Is this article appropriate for the workplace? Does it add or subtract to the overall quality of a member’s feed experience?
Personalisation: Does this result matter to a particular member? A good example of personalisation in storylines is that we favor content from your network over content from people outside of it.
Using these fine-grained attributes allowed us to evaluate the quality of storylines more objectively and to prioritize the features of our relevance model on the area where we wanted to improve. In addition to the above, we have already planned several other signals that we plan to test and deploy to make storylines as useful to the world’s professionals as possible.
Trending Storylines is a system that leverages the best of human and machines to create high quality storylines, keeps them up-to-date, and offer a personalized experience for members. The main engineering theme when building it was leveraging existing infrastructure to achieve a new purpose at scale. We took different systems that had never worked together before and were able to build a completely new product with them. The main benefits of this approach is that it has allowed us to move fast and rely on a mature stack. The main drawback is that an engineer working on Trending Storylines needs to be onboarded onto many technology stacks in order to work on the product, which adds some friction to the process.
So far, the benefits have massively outweighed the costs. This architecture will allow us to grow the breadth of Trending Storylines with new kinds of content, and increase the depth of the product with more personalization.
Storylines would not have been possible without the significant engineering contribution of people from over 9 teams: Benjamin Poyet, David Golland, Eric Huang, Harry Bui, Kiryl Yesipau, Lei Qu, Lingbing Wang, Meling Wu, Milad AlemZadeh, Rajeev Kumar, Samish Chandra Kolli, Shubham Gupta, Siddharth Pal, Subhash Gali, Vardharaj Lakshminarasimhan, Viet Ha-Thuc, Weiwei Guo, Yesu Feng, and Ying Xuan.
Trending Storylines is also a testament to the great engineering work done by many of our colleagues over the last few years. Without them, we would have never been able to ship Storylines at this pace.
Combining LinkedIn’s Content Filtering and Microsoft Cognitive Services to Keep Inappropriate Content Off Our Sites
A Look Behind the AI that Powers LinkedIn’s Feed: Sifting through Billions of Conversations to Create Personalized News Feeds for Hundreds of Millions of Members