Rethinking Endorsements Infrastructure, Part 1
September 9, 2016
If you’ve used LinkedIn, you likely know about Endorsements. Tens of millions of professionals around the world have pressed the “Endorse” button, sharing more than 10 billion skill endorsements with their connections. When LinkedIn first launched Endorsements, the goal was simple—to create a way to allow others to recognize people’s skills.
While many people on LinkedIn use Endorsements, we realized that there was still more to do to achieve the goal of the product. We noticed that some endorsements were perceived to be more valuable than others, so we set out to build a better product that would help our members understand the value of and leverage endorsements to their full potential.
Realizing our goal of delivering endorsements that provide even more value to our members required a blend of research, new machine learning models, and a rearchitecting of the backend infrastructure that both serves and recommends new endorsements. Our solution not only serves endorsements at a faster speed, but also allows us to deliver more insights to members based on their connections and skills. We believe that our new Endorsements infrastructure lays the groundwork so that we can eventually meet the goals mentioned above.
Historically, the Endorsement experience on LinkedIn has relied on two pillars:
one for suggested endorsements, and another for serving endorsements. The suggested endorsement pipeline is a part of our backend infrastructure that produces a set of suggestions that are presented to our members to help guide their endorsements for people in their network. These suggestions offer a convenient way for members to reward their co-workers for their skills.
The second pillar in the Endorsement stack simply allows members to endorse their connections when they are viewing their profile, and then serves those endorsements.
At LinkedIn, we use a micro-service infrastructure: large features (e.g., Endorsements) are encapsulated in a single service that exposes all the operations that can be done. This allows the service to evolve at its own pace as long as it remains backward-compatible.
Since its inception, the Endorsements product has relied on a straightforward architecture where all the endorsements are served from a single SQL instance. In order to scale to 10 billion endorsements at thousands of queries per second (QPS), we had to heavily optimize the indices that are used in the database.
For example, because the most common use case is to serve endorsements given to a member we build an structured index with the following hierarchy:
Recipient Id (1) → Endorsed Item (2) → Endorser Id (3)
Using these indices, it becomes very simple to perform the following operations:
Retrieve all endorsements for member “Joey”: We use index (1), then retrieve all rows in there.
Retrieve all endorsements for member “Joey” for skill “Java”: We use a combination of index (1) and (2) and we list all rows.
Retrieve all endorsements for member “Joey” for skill “Java” given by “Yolanda” (assuming “Yolanda” is the endorser): This time we use a combination of indices (1), (2), and (3) and retrieve the rows that are found.
However, this approach has introduced some limitations. For example, if we wanted to find all the endorsements given by member “Jie,” we would have to do a full traversal of the 10 billion rows in the database.
There are two suggested endorsements pipelines:
Endorsable Skills: This flow predicts a list of skills that a member might have or might want to promote.
Suggested Endorsements: This flow provides a list of (recipient, skill) tuples each member can endorse.
The two pipelines generate suggestions offline in Hadoop workflows that use machine learning, incorporating features such as mutual connections, profile information, and previous endorsements. The “endorsable skills” and “suggested endorsements” data are pushed to two different Voldemort stores. The key-value stores are queried online by the endorsement service to fetch potential endorsements, which are then delivered to the user.
Over the years, the machine learning algorithms powering the suggestions have been fine-tuned to optimize for virality and skill diversity. Endorsements have generated a lot of engagement from our members and have powered many other actions on LinkedIn. The sheer number of endorsements given to date and the function’s reach among our members definitely reflects the effectiveness of that model.
As endorsements have gained in maturity in more countries, we decided it was time to redo our architecture to support the ever-growing engagement with the product. Moreover, redoing our architecture opened doors to surfacing more relevant content for our members. We embarked on a quest to rebuild the vision for Endorsements.
Optimizing for the right target metric
Moving away from a world where the success of Endorsements is measured by the total number of endorsements given by our members, we wanted to define a new metric that would better represent the goal of the Endorsements product. In this article, we’ll refer to this new metric as a “Highly-Rated Endorsement.”
Getting feedback from our members
At LinkedIn, we are a “members first” organization, meaning that we believe that the voice of our members should guide our product decisions. In order to retrain our algorithms to deliver the best experience possible, we wanted to understand which endorsements our members value most.
Using feedback to develop a target metric
Our goal in developing a target metric for Endorsements was to capture endorsements that offer meaningful validation of a member’s skill. From member feedback we gathered, we observed that validation comes from endorsers who know both the recipient and the skill.
To build a more specific target metric, we started with over 80 different candidate signals about the endorser, recipient, and their relationship that could be useful for the definition.
These features, combined with the responses, allowed us to correlate different features with the positive signals we received through in-product feedback. Using various machine learning algorithms, we eventually identified the 12 most useful signals.
Signal flow to identify top features for highly-rated endorsements
In order to develop a useful metric for measuring the health of the Endorsement ecosystem and guiding product development, we wanted the definition to be accurate, intuitive, and comprehensive:
Accurate means that it strongly correlates with the member survey results.
Intuitive means that it aligns with our intuition of the product and is easy to communicate to members.
Comprehensive means that it manages to capture the wide range of endorsements that are high-quality.
We did not directly use the machine-learned model as our definition, because although it is accurate and comprehensive, it sacrifices intuition. Instead, we sought a simpler definition that remains aligned with the survey data. From the 12 top features, we started to build candidate definitions for a highly-rated endorsement. For example, we could consider highly-rated endorsements to be those given by a coworker who is an expert in the skill area. For each candidate definition, we looked at the recall rate and precision alongside human understandability.
At the end of the process, we developed a target Endorsement metric that can be described as the following:
A highly-rated endorsement is one made by a connection that knows the person and the skill.
For each component of the definition (knowing the skill and knowing the person), we identified thresholds for their respective top signals based on intuitive cut-offs backed by machine learning results.
Join us again next week for Part 2 in this series, which will discuss the the new backend infrastructure for serving endorsements at LinkedIn.