Serving Top Comments in Professional Social Networks
A case study of scalable and feature-rich machine learning systems
September 20, 2017
As a professional social network serving more than 500 million worldwide members, LinkedIn is the premier destination for professional conversations. We have a wide variety of posts that attract significant engagement, and some of these posts go viral. These posts attract likes and comments in large numbers. In many cases, comments on these popular threads became so numerous that their value was getting lost in the noise.
To surface comments valuable to LinkedIn members, we’ve built a scalable comment ranking system that uses machine learning (ML) to provide a personalized conversational experience to each member visiting the LinkedIn content ecosystem. This blog post details our design, the scalability challenges that we faced and overcame, and the tight latency budgets that the system operates under.
The content on the LinkedIn feed is rich, diverse, and widely consumed. Some of it is generated organically by our members (e.g., the post shown above, authored by LinkedIn founder Reid Hoffman in response to a news article), and some of it is from third-party websites that regularly post high-quality content for distribution. These articles draw significant engagement from our members via comments and likes. Until recently, however, we lacked the ability to convert this engagement into meaningful conversations between people on the platform.
The default mode for ranking comments on the feed was rank by recency: if you were the last person to post a comment on a popular thread, your comment would show up first. We had no understanding of the comment’s content, no notion of personalization, and no knowledge of the engagement that these comments were drawing.
This problem garnered our attention in mid-2016. We set up a simple minimum-viable-product (MVP) that tried to rank comments by the number of aggregated likes they gathered (using that number as a simple proxy for comment quality). The MVP was successful in that it demonstrated that ranked comments had value. However, it also demonstrated the weaknesses of relying on a single, non-personalized, after-the-fact feature: comments would be highly-ranked only after they had garnered enough social proof. Ideally, we wanted to pick out comments that were engaging ahead of time. In addition, latency and scaling concerns prevented us from further productionizing the system.
Learning from the MVP, we’ve built a scalable machine learning service system that provides personalized ranking of comments using a rich feature set. Comment features, such as a comment’s actual content, the level of engagement it has received, and information about who posted the comment, are computed ahead of time through Samza stream processors. These features are then pre-materialized and associated with comments in a fast inverted-index system (FollowFeed). When a request comes in asking for relevant comments for a given viewer, the precomputed features are joined with viewer features and run through an ML model to provide personalized ranking of comments on the thread. Over the past few months, we have proven that this system can service thousands of queries per second (QPS) with a tight latency profile of 60ms at mean and 190ms at 99th percentile.
The rest of this blog post provides details on how we built the system.
Understanding conversations at LinkedIn
On the feed, every comment usually generates a Comment Viral Update that is distributed to the commenter’s connections. Comment Viral Updates is the term we use for the most engaging set of updates on LinkedIn’s news feed. They look like this:
Comment Viral Updates show up on your feed when one of your connections comments on an article or status update that has also generated a lot of engagement from other members in the form of likes or comments. Among all feed updates, the Comment Viral Update receives the greatest engagement. The number of interactions generated per comment update impression is 2.5 times that of connection updates (updates indicating your professional network has new connections), and 1.8 times that of viral updates from likes from your connections. In 2017, we've seen record levels of engagement. Social actions on the feed (likes, shares, comments, etc.) have grown by +60% year-over-year. Members are interacting on the feed more than ever, and the value of conversations on these threads is immense and growing. The sheer volume of comments produced on some threads, though, is a challenge.
Comment threads, like most things on the internet, have long tail effects. From an inventory perspective, a small number of posts with long discussions disproportionately dominate the time members spend on the site. Specifically, 1% of these long threads attract more than 40% of member visits. Visiting a comment thread with hundreds of comments in reverse chronological order is not a pleasant experience. We needed a way to personalize comment ranking so that each member can derive value from comments on the long threads they care about.
So, how did we end up achieving our goal of providing relevant comments to our members? Before we get there, it might be helpful to look at the structure of our minimal viable product (MVP).
The architecture of the MVP was pretty straightforward:
The iOS and Android mobile apps talk to Voyager-API (a REST-based application layer) to request relevant comments for an activity on the feed. Voyager-API forwards that request to Feed Mixer (our ranking and blending layer) for producing a list of relevant comments.
Feed Mixer implements the MVP comment relevance algorithm by fanning out requests to member information stores to get a member’s 1st degree connections, and the comment thread store to get the entire list of comments on a thread. The comment thread store is the source of truth store in LinkedIn’s NoSQL store, Espresso. For each comment on the thread, a simple feature (the number of aggregated likes) is stored. The comments are ranked by such features and sent back to Voyager-API to be shown to the viewer.
As is evident from the architecture diagram, there’s a problematic online feature join taking place when the viewer is waiting for a list of relevant comments. As a result, this system is restricted by the number of comments that can be ranked due to latency concerns. The high computation and latency overhead of retrieving and processing comment features from the Espresso store drove tail latency over 800ms and caused noticeable site-speed degradation. It was clearly not a long-term solution.
Despite this, the MVP was successful. It demonstrated the value of ranking comments instead of simply displaying them in chronological order. It also demonstrated the weaknesses in relying on such simple metrics—good comments would be buried beneath not-so-good ones because they didn’t yet have enough likes, and early comments on a thread having an unfair advantage because they had more time to accumulate likes and replies. With this experience beneath our belts, we set out to productionize the system.
Scalable production architecture
In August 2016, we decided to move forward with an architecture that did not have this problematic online feature generation. We also used this opportunity to try to significantly improve comment ranking.
For starters, we picked out a list of features that would describe comments well and the data sources that we would produce these features from. This was a much larger set than the single-feature MVP. We started with three classes of features:
Features about the commenter;
Features about the comment itself;
Features about engagement with the comment.
All of these features were produced by querying individual data stores in near-realtime. The features were then pushed through a feature join pipeline driven by the comment creation and interaction. The feature join pipeline was built on Apache Samza (LinkedIn’s nearline stream processing system).
The end result was a list of joined features that was available with a single SSD/memory lookup in our serving systems for each comment. In addition, all of these features were persisted to HDFS for offline analysis and model training.
For the purposes of comment relevance, we needed a serving subsystem that could satisfy the following requirements:
A system with an index that is able to retrieve all comments on a comment thread (quickly).
Fast access to the list of joined features for each comment on the thread.
The ability to service thousands of QPS at scale and produce relevant comments for each request on the feed.
While (1) and (2) are easy tasks, it’s (3) that drives the design. Being able to service thousands of QPS at scale narrowed the scope down to only two subsystems: Galene (our search stack, document sharded) or FollowFeed (our feed stack, term sharded). Both are robust systems with strong monitoring, deployability, and ops characteristics. Both systems do their respective jobs pretty well: Galene powers LinkedIn’s search traffic and several site features (e.g., job recommendations, people search), while FollowFeed powers all the user-generated content in the feed. After deliberation and some benchmarking, we went with FollowFeed because it was already well-integrated with the feed ecosystem. This led to some interesting design choices, however.
For starters, FollowFeed is a term-sharded system (each leaf node stores documents associated with a dominant term). In FollowFeed’s case, terms are built around the concept of an actor (e.g., a member, a company, etc.) and a list of social actions being done by that actor (comments, likes, shares, etc.). To make FollowFeed return relevant comments, we needed to restructure some basics.
1. Retool the system to accept the wide diversity of items that can be commented upon.
Unlike the handful of actors that can produce comments inside of the LinkedIn ecosystem (e.g., members, schools, companies), we have a much larger variety of things that members can comment upon (e.g., articles, long form posts, shares, anniversaries, videos, etc.). FollowFeed was built largely around a data structure that associates each MemberID with a list of activities. We were turning this data structure on its head and producing a data structure that associates a post id with a list of comment activities. Conceptually, it’s a small change, but accomplishing this in a large production system took us quite a bit longer than expected.
2. Fix the notion of Top N and Fanout.
In the feed world, the challenge is to determine the Top N posts that we should show to a given user. That is, given a single member and their connection list, produce the Top N posts that are applicable to that member. This boils down to taking the connection set of a member and fanning out a request asking for the top N posts from each of the member’s connections.
In the comment relevance case, though, there’s an expansion from the 1:N case to the M:NM case. Given M post ids, produce the Top N comments for each of these posts.
While it would have been tempting to just fan out the requests at the FeedMixer layer and query FollowFeed N times, it would not have been the best solution.
The Variance Sum Law indicates that the variance of N independent latency measurements is additive. That is,
Fanout at the FeedMixer layer would impact latency adversely. The final tail latency would include not just the variance produced by FollowFeed-Storage (our lowest-level serving infrastructure layer), but also the variance at our FollowFeed-Query layer. However, by performing a single batch request to FollowFeed-Query and then doing a fanout to FollowFeed-Storage, the impact on the tail latency was contained.
3. Provide, distribute and access new features and use them to rank comments.
The system was in the early stages of being able to handle features gracefully. We set up and leveraged feature stores on each serving node, distributing the data appropriately among the nodes and sharding it according to the given keys. This was straightforward infrastructure engineering, and once the data was present, we leveraged the excellent model execution capabilities of FollowFeed to rank relevant comments from a machine-learned model.
The machine-learned model
For the personalized comment ranking, we trained a logistic regression model to predict the viewer’s engagement with respect to individual comments. The model is trained by feeding a large variety of features into LinkedIn’s Photon ML library.
The machine-learned model takes features from the viewer, the commenter, and comment itself. Any viewer-specific features (e.g., the affinity between commenter and viewer) are generated in offline Hadoop workflows on daily basis. These features are pushed from HDFS into an online Voldemort store for querying. Comment-related features (e.g., the language of a comment) are generated in a nearline Samza processor at the time of comment creation. As depicted in the previous section, these features are indexed in FollowFeed to be served online, and ETLed to HDFS for offline model training.
We have commenter features that characterize each commenter’s reputation and popularity (i.e., their profile view counts, influencer status, etc.). We also match the commenter and the viewers on the basis of industry, location, and other shared attributes. At LinkedIn, we have access to various mature machine-learned signals to represent the engagement affinity between two members. We take into account their connection/follow relationships, their profile similarity, and past interactions on the feed. These signals are crucial inputs that help us select high quality comments that are personalized to each viewer.
As far as the actual comment content is concerned, we leverage our in-house Natural Language Processing (NLP) library to characterize the language, the comment length, the grammatical structure, the presence/absence of hashtags, and other content features. We also attempt to infer whether the comment includes mentions of LinkedIn members or other entities.
Social engagement features generated on the feed are segmented by industries for the machine learning model to capture when a particular comment might only be popular for a segment of members.
Comment freshness features capture recent actions on the comment. We capture the timestamp of comment creation, the last reply, and the last like. Viewers have a tendency to read fresh comments or recently discussed topics.
We’ve only scratched the surface here. There are close to 100 features that we capture and use in online ranking. An ML model is used to train these features to predict a member’s comment engagement. For each member, we depend on other ML models to classify and detect spam and low-quality content in comments and pick the ones we know are good to show to the viewer.
The system has been stable, showing a production tail latency of 15ms at the 50th percentile and 65ms at the 99th percentile. When taking member features into account, the entire system’s median latency is 60ms, and end-to-end 99th percentile latency is 190ms. The production system ranks twice as many comments in ¼ of the time than the original MVP and has been an unqualified success in meeting its architectural goals.
Long infrastructure projects like comment relevance have an interesting lifecycle: first a period of gestation, then a period of evaluation, then a period of execution, and then finally, stability in production. It is refreshing to look back on all the work that was done and see how value was produced for LinkedIn’s more than 500 million members. We’ve seen improvements in the amount of time members spend reading comments, their engagement with the feed, and the interactions and connections they form with other members in the LinkedIn ecosystem. The personalization and discovery aspects of the system are valued by members, and the reception of the system has been positive.
Likes on Replies moved significantly in response to our changes. We had a 22% lift in the number of comments viewed on the feed on iOS, and about 14% equivalent lift on Android.
We hope you’ll enjoy the more relevant comments on LinkedIn’s feed. Do get back to us with your comments!
This system has been truly a team endeavor. Without the hard work, cooperation, and leadership of engineers, product managers, engineering managers, and directors across the stack, this would not have been possible. Big thanks to Naseer Thurkintavida, Chao Zhang, Ying Xuan, Qian Su, Heidi Wang, Marco Alvarado, Parin Shah, Simon Tao, Shubham Gupta, Brett Konold, Joshua Hartman, Souvik Ghosh, Tim Jurka, Tim Chao, Aarthi Jayaram, Tim Converse, and Josh Walker, without whom this would have been a fruitless endeavor.