Scaling Contextual Conversation Suggestions Over 500 Million Members
August 14, 2017
LinkedIn Messaging is our conversation platform that enables meaningful, professional conversations between our members. We recently launched a new messaging experience that allows members to start and continue a conversation from any page on LinkedIn desktop, and then carry over the conversation as they navigate to other pages on the site.
Consistent with LinkedIn’s vision to create economic opportunity for our members, our engineering teams looked at ways that the conversation and messaging windows on our platform could be used to help members advance their careers by building meaningful relationships., That’s why we introduced contextual suggestions in conversation windows. When a member is on a company or a job page, we recommend people from their network that best connect them to that company or job.
In the rest of this post, we’ll be explaining the process we went through to develop this recommendation system, explain how we use the LinkedIn Economic Graph to generate recommendations, and some of the interesting engineering hurdles our team had to overcome before we could roll out this feature. Our hope is that other engineering teams will be able to benefit from our lessons learned.
Leveraging LinkedIn’s data
Resolving relevant suggestions for messaging very naturally fits as a graph search problem over LinkedIn’s Economic Graph. The Economic Graph consists of entities (members and companies) represented as nodes and their relationships to each other represented as weighted edges. The weight, or affinity, between nodes is based on a combination of connection strength and engagement between members.
Fortunately, LinkedIn has great tools and data sets already available to leverage this representation, allowing us to easily apply the Economic Graph to our messaging suggestion project.
We want to find all the paths of maximum length 3 from a starting node (a viewing member) to a destination node (company), and rank them based on the edges’ weights. Note that we only recommend the first degree connections. The first degree connection can be:
Directly connected to company: currently or previously working at the company
Indirectly connected to company: connected to a person who is directly connected to the company.
Since we only recommend first degree connections to our members, these length 3 paths are concatenated into length 2 paths representing the member and their connections.
Mathematically, given a viewer, a first degree member (let’s call him fdMember), and a company, the score of the path can be defined as:
score(viewer, fdMember, company) = f (affm_m(viewer, fdMember), affm_c (fdMember, company))
where affm_m is the affinity score between two members, calculated based on a variety of factors like profile similarity, engagement between members, etc., and affm_c is the affinity score between a member and a company.
Let’s say sdMember denotes a second degree connection of the viewer who is directly connected to fdMember as well as to the company. So, affm_c can be further broken down into:
= directAffinity(# of years of employment, job role, etc.) if fdMember is directly connected to company
= indirectAffinity( fover all j (fdMember, sdMemberj, company)) if fdMember is indirectly connected to company
Function indirectAffinity is responsible for concatenating a length 3 path from a viewer to a company to a length 2 path. Each of the functions f, directAffinity, and indirectAffinity corresponds to either a machine-learned model or a heuristic-based function in our implementation. We can experiment with different functions for f, directAffinity, and indirectAffinity to achieve the highest clickthrough rate (CTR); however, that discussion is out of the scope of this post.
We faced the following challenges in implementing this system, and as you will see throughout the post, these issues are not easy to address at the same time.
Liquidity: To ensure that we could have recommendations for most members, we used historical page view data to calculate the percentage of page views for which we would have at least one recommendation. We defined this percentage as “liquidity.” We found that using only direct connections, we would get around 30% liquidity, but if we extended to indirect connections as well, our liquidity bumps up to 70%.
Cost to Serve (C2S): We have 500 million members and more than 9 million companies on LinkedIn. With this data, we have the candidate sets to provide recommendations for about 500B <viewer, company> tuples (growing each day!). Generating all such recommendations in a batch (Hadoop) job on commodity hardware would take days, especially for computing indirectAffinity. Such a job would also negatively impact other jobs running on the same cluster. If we calculated a maximum of 10 recommendations for all the possible <viewer, company> tuples, it would require around 80TB of storage space in a fast (SSD-based) key, value lookup store. That introduces massive C2S when most of the records in that store will be cold.
Latency: We need to return the recommendations as quickly as possible when a member visits a company or a job page so as to not negatively impact their experience.
Our goal was to build an architecture that provided 70% liquidity, kept storage and computation costs sustainable, and had a 99th percentile latency of less than 500ms. We chose this latency number because it is less than the retrieval time for other items shown on the conversation window. Therefore, loading contextual suggestions in parallel with other items should not increase the overall load time of the conversation windows.
The rest of this post describes how we followed an iterative approach to reach our goals.
Our first attempt at building the system was guided by the principles of Data Jujitsu: before we invested too much in building a perfect system, we wanted to investigate if users liked our product in the first place. Since all the data that we needed for computing recommendations is available offline in our Hadoop cluster, we simply computed the recommendation offline for each start node (viewing member) and the end node (company) pair and pushed the result into our online key-value data store ready to serve. This way we could quickly launch our initial product and learn how it would perform.
But this strategy yielded a massive number of <viewer, company> tuple recommendations. In order to optimize our efforts, we prioritized generating tuples for our most active members and companies, managing to reduce the above result set by 60%. We then decided to generate indirect connection recommendations for only 10% of these members, which further reduced the result set to an acceptable value that required only 1TB of online key, value storage (as compared to 80TB for all the possible tuples).
We also added several other optimizations, described below.
Eliminate massive joins
In our first attempt, our Hadoop jobs were still running for days and not converging. We found out that we were doing a massive join between the member’s connection table (> 1 trillion entries if we include second degrees) and the company employee table (> 1 billion entries) in order to compute the function indirectAffinity. Data skewness leading to long tails in MapReduce jobs made the situation worse. We decided to optimize this join in the following way.
Instead of getting all the first degree and second degree connections for a member, and joining them with the company-employee table directly, we first calculated all possible values of affm_m (on the order of tens of billions of entries) and all possible values of affm_c when a member is directly connected to a company (directAffinity, on the order of billions of entries). Then, we do a join over these two (relatively less massive) tables to compute indirectAffinity.
By doing this join, we were able to compute the recommendations within 16-18 hours.
Reduce number of keys
We still have tens of billions of <viewerId, companyId> tuples. Having such a massive number of keys puts a lot of unnecessary pressure our online storage system given than most of the keys are still cold. One way to reduce the number of keys is to store all the recommendations for a member in a single record by using only viewerId as the key. However, this leads to individual records becoming too large. Therefore, we decided to use a hybrid approach (motivated by Apache Cassandra) to use <viewer, (companyId, %batch_size)> as our key. Using this approach, we can store all the recommendations for a viewer using only batch_size records, and each individual record doesn’t become too large. Our final number of keys was reduced to just over half a billion by using this methodology.
Our initial launch showed a very positive response. We were also able to validate that second degree (indirect) recommendations achieve almost the same CTR as first degree (direct) recommendations.
In terms of our architecture design goals:
Liquidity: We were able to achieve a liquidity of only 20%, which is not acceptable.
C2S: The storage and computation cost were high, but still acceptable.
Latency: Since we pre-computed all the recommendations, our 99th percentile latency was less than 200ms, which is acceptable.
Scale it up
After we validated the product and realized that enabling indirect connection recommendations for everyone was a necessary next step, we started looking for ways to scale up our recommendation system.
Our next attempt was to use an online prediction solution that computes these recommendations on the fly after the viewer lands on a specific page. Our engineering team built a service, called the Graph Service, that can be used to run graph search queries over the Economic Graph. We can query this service to give us all the direct and indirect connection paths between a viewer and company. Then, we query our pre-generated feature stores to get features like member to member similarity, length of employment, etc. to rank these paths using different variants of f, directAffinity, and indirectAffinity. This increased the liquidity to our target 70%!
However, better liquidity came with a price: speed. Since we were relying on other services and computing the recommendations on the fly, we were bound by latencies introduced by external systems. Here is our call graph for each recommendation computation.
We make two parallel calls to our Graph Service to get direct connection paths and indirect connection paths (steps 1 and 2 in the figure above)
Step 2 has two arrows because we need to make two separate calls to the Graph Service to get all the second degree connections:
The first call is to get all the paths that only include the first degree connecting members in each of the indirect connection paths.
The second call is a batch call to get all the second degree connections for connections for those first degree connecting members.
These two calls happen in serial order, as the second call depends on the results of the first one.
The Graph Service was designed in this fashion in order to avoid a huge response size, as often times the second degree network is too large for one single call.
As you can see, the second call to the Graph Service adds ~700 (=250 + 430) milliseconds and is the leading cause of long latencies for generating the recommendations online.
Liquidity: 70%, which is great.
C2S: Since we compute everything online, we use zero additional storage! Computation is only initiated when necessary. This required no additional hardware provisioning and we increased the QPS to the Graph Service by only 1.9%.
Latency: We ran a performance test, and our 99th percentile was around 1.3 seconds, which is not acceptable.
Make it fast
Now that we had solved two of the three challenges, it was time to try to solve the final one: latency. In our UI, we don’t need to show the complete list of second degree connections for indirect paths. We only show the indirect first degree connections of the viewer and the number of second degree connections that they can introduce. The additional batch call to the Graph Service to get second degree network connections is just to compute the member-to-company affinity (affm_c) between indirect first degree connections and the target company, i.e., function indirectAffinity. What if we computed it offline?
We decided to pre-compute affm_c between <member, company> tuples for both direct (function directAffinity) and indirect (function indirectAffinity) connections in Hadoop and push them to an online key-value store for lookup. Then, in our online service, all we needed to do was combine these scores with member-to-member affinity (affm_m) to compute the final scores f for a path.
This is what our final call graph looks like:
Note that we did introduce additional key-value storage for storing affm_c. However, the amount of storage needed is quite small (~620 GB) because:
We are just storing a single score for <member, company> tuples rather than storing all the recommendations for that tuple.
The number of tuples is also much smaller, since we are only storing a <member, company> tuple if the member either works (or worked) at that company (directAffinity) or has a direct connection to that company (indirectAffinity). This number (billions of tuples) is much smaller compared to the number of tuples if we stored all the recommendations (hundreds of billions).
Computing affm_c offline also provides us the flexibility to use complicated models for directAffinity and indirectAffinity, with more features, with relative ease compared to online computation.
Liquidity: 70%, which is great.
C2S: Only using 600 GB of storage, and computation on the cluster takes only about 6 hours, which is acceptable.
Latency: We ran a performance test and our 99th percentile latency is around 460ms, while 95th percentile latency is 300ms and the average latency is 50ms. This is acceptable.
This project was a huge undertaking and a great learning experience for all involved. After going through this process, the team identified three primary lessons learned that we wanted to highlight for other engineers who may be undertaking similar projects:
Data jiujitsu: When building a data product, start small and verify that the users like the idea before investing too much to get a perfect product. This is reiterated in “Rules of Machine Learning: Best Practices for ML Engineering” by Martin Zinkevich.
Hadoop joins: If your offline flow is taking a long time to converge, it might be that you are doing massive joins.
Use hybrid: When building an online recommendation service, consider using a hybrid solution to precompute some parts of the computation in order to speed up your service. We have other successful systems at LinkedIn that follow a similar approach, including our Ads ML system.
This project was a massive undertaking that built on the work of several teams at LinkedIn. We’d especially like to acknowledge the work of these individuals: Sammy Shreibati, Iris Tu, Prachi Gupta, Romer Rosales, Viet Nguyen, Chris Szeto, Haoyang (Phil) Li, Yogesh Mandawewala, Bryan Levay, Chris Langbort, Henry Majoros, Vivian Uratra, Wren Turkal, and Adam Leon.