Building Representative Talent Search at LinkedIn
October 10, 2018
Introduction: Building products that create opportunity
Motivated by LinkedIn’s goal of creating economic opportunity for every member of the global workforce and by a keen interest from our customers in making sure that they are able to source diverse talent, we have adopted a “diversity by design” approach for Talent Search in our LinkedIn Recruiter product.
Focusing on representative results
While “bias” has a widely-accepted definition in the field of statistics, the real-world manifestation of societal bias is often less clearly defined. Ranking algorithms form the core of search and recommendation systems for several key applications such as hiring, lending, and college admissions. Recent studies have shown that ranked lists produced by a machine learning model can inadvertently result in unintended bias and reduced visibility for an already disadvantaged group.
Our approach is based on the definition of equal opportunity defined in prior research on the unintended consequences of machine learning. For LinkedIn Talent Search, we also took the position that it is desirable for the set of qualified candidates and the set of top-ranked results for a search request to have about the same distribution on the attribute of interest—in other words, the top search results should be representative of the broader qualified candidate set. Although LinkedIn Talent Search utilizes billions of data points, hundreds of features, and is constantly updated to provide more customer value, we previously blocked gender signals from use in Recruiter products specifically to avoid unintended bias in our models. As discussed previously, being unaware of the possibility of biases is not the same thing as taking proactive steps to mitigate them.
In this blog post, we will describe the technical architecture of our representative talent-search system that has been deployed to all users of the LinkedIn Recruiter product worldwide, and present the effect of representative ranking on the proposed operational and business metrics.
Talent Search at LinkedIn
The goal of LinkedIn Talent Search systems is to provide a tool that recruiters and hiring managers can use to source suitable talent for their needs. This is mainly accomplished through the LinkedIn Recruiter product, which recruiters use to identify “talent pools” that are optimized for the likelihood of making a successful hire. The Recruiter product provides a ranked list of candidates corresponding to a search request in the form of a query, a job posting, or an ideal candidate. Given a search request, candidates that match the request are first selected and then ranked based on a variety of factors (such as the similarity of their work experience/skills with the search criteria, job posting location, and the likelihood of a response from an interested candidate) using machine-learned models in multiple passes. A screenshot from the application is presented in Figure 1.
Figure 1: A screenshot from the LinkedIn Recruiter product, some details redacted
Our focus as part of representative results is on the set of potential candidates (as well as their relative ordering) shown to a recruiter corresponding to a given search query. In this way, Recruiter will still return the same set of qualified candidates in response to any particular search; no one is added or removed. Recruiter ranking already varies depending on a number of factors, including candidates' interest in a particular employer and their preferred location to work, to optimize for our customers' success. Many of the technical details about the LinkedIn search stack have been described in prior blog posts.
Understanding representativeness of a ranked list
We next explain our attempts to measure the representativeness of the ranked lists generated by our talent search and recommendation systems.
Intuition underlying representativeness measurement
Our measurement and mitigation approach assumes that in the ideal setting, the set of qualified candidates and the set of top-ranked results for a search request both should have the same distribution on the attribute of interest; i.e., the ranked/recommended list should be representative of the qualified list. This assumption could be thought of as based on the definition of equal opportunity defined in prior research in the domain of machine learning. In mathematical terms, a predictor function is said to satisfy equal opportunity with respect to an attribute and true outcome if the predictor and the attribute are independent conditional on the true outcome being 1 (favorable).
In our setting, we assume that the LinkedIn members that match the criteria specified by recruiters in a search request are “qualified” for that search request. We can roughly map to the above definition as follows. The predictor function corresponds to whether a candidate is presented in the top-ranked results for the search request, while the true outcome corresponds to whether a candidate matches the search request criteria (or equivalently, is “qualified” for the search request). Satisfying the above definition means that whether or not a member is included in the top-ranked results does not depend on the attribute of interest, or, equivalently, that the proportion of members belonging to a given value of the attribute does not vary between the set of qualified candidates and the set of top-ranked results. This requirement can also be thought of as seeking statistical parity between the set of qualified candidates and the set of top-ranked results.
In other words, the operating definition of “equal” or “fair” here is towards achieving ranked results that are representative (in terms of the attribute of interest, and in the scope of this work, inferred gender) of the qualified population. We define qualified population to be the set of candidates (LinkedIn members) that match the criteria set forth in the recruiter’s query (e.g., if the query is “Skill:Java,” then the qualified population is the set of members that are familiar with computer programming using the Java programming language).
As mentioned above, our measures for evaluating representativeness are based on the assumption that the distribution of the attribute of interest for the top-ranked candidates in a search query should ideally reflect the corresponding distribution over the set of qualified candidates. Our main measure computes the skew for the attribute in question by comparing the proportion of candidates having an attribute value among the set of highest-ranked candidates to the corresponding proportion among the set of qualified candidates.
Next, we will present our approach for representative ranking which is designed to ensure that the top search results reflect the distribution of the attribute of interest (inferred gender, in our context) in the underlying talent pool, while also taking into account the scores assigned by our machine learned models to the potential candidates.
Representative Talent Search ranking
The main idea behind our approach for generating representative results in Talent Search is to perform a post-processing step, wherein we re-rank the set of candidates retrieved by the machine-learned model. A high-level overview of the representative ranking algorithm is as follows:
Partition the set of potential candidates into different gender buckets.
Rank the candidates in each gender bucket according to the scores assigned by the machine-learned model.
Merge the gender buckets, while obeying representation constraints based on the gender proportions computed from the set of qualified candidates. The merging is designed to keep the gender proportions in the ranked list similar to the corresponding proportions in the set of qualified candidates for every index of recommendation (e.g., for the top result, for the top two results, for the top five results, and so on). Note that the merging preserves the score-based order of candidates within each gender bucket.
We adopted the post-processing approach due to the following practical considerations. First, this approach is agnostic to the specifics of each model and therefore scalable across different model choices for our application. Second, this approach is easier to incorporate as part of existing systems, since we can build a stand-alone service/component for post-processing without significant modifications to the existing components. Finally, our approach aims to ensure that the search results presented to the users of LinkedIn Recruiter are representative of the underlying talent pool.
We compute the gender proportions in the set of qualified candidates as follows. First, we use LinkedIn’s Galene search engine to obtain the set of qualified candidates that match the criteria specified as part of the search query by the user. We then compute the empirical distribution over the genders to understand the required gender representation constraints, and use this distribution for computing our representation metrics (after the fact, for evaluation), and apply re-ranking (during recommendation). Next, we present the architecture for our ranking system.
Figure 2 details our two-tiered ranking architecture for achieving gender-representative ranking for LinkedIn Talent Search systems. While the details of our machine learning models optimized for two-way (Recruiter-Candidate) interest are out of the scope of this post, we direct the interested reader to some of our team’s research papers (BigData’15, WWWCompanion’16, SIGIR’18, CIKM’18--1, CIKM’18--2) describing the methodology on how we train our candidate ranking models.
Figure 2: Architecture for Representative Ranker. The numbers in circles denote the chronological step in the computational workflow.
There are three primary components via which we attempt to achieve representative ranking:
Computation of the gender distribution on the qualified set of candidates, alongside our first-level ranking.
Re-ranking of the candidate list utilizing our first-level, machine-learned ranking model’s scores and the desired gender distributions. The top-k’ candidates, ranked in a representative manner, are then sent to the second-level ranking.
Re-ranking of the candidate list utilizing our second-level, machine-learned ranking model’s scores and the desired gender distributions. The top-k’’ candidates, ranked in a representative manner, are then presented to the recruiter.
In the figure, each Searcher has access to a subset of our members, hence the search and retrieval is applied simultaneously on these sub-partitions. We also count the number of members falling into each inferred gender within each searcher (that pass the search criteria as pure filtering conditions), and then combine these counts to get the overall distribution of members to gender values. Since the Recruiter is presented the candidates in a page-by-page manner, always a top subset (k → k’ → k’’) of the candidates (after ranking, and representative re-ranking) are sent to the next stage (first-level → second-level → recruiter).
Validating our approach
Our experiments with the gender-representative ranking approach as outlined in this blog post have shown that such a ranking scheme ensures that more than 95% of all the searches are representative of any gender compared to the qualified population of the search. This shows that the re-ranking approach works well, and we have ongoing efforts at LinkedIn to further improve the representativeness of our ranking algorithms.
While the representativeness results have shown the effectiveness of our approach, we still ran an A/B test over our recruiter user set for two weeks to see the effect of representative ranking on our standard business metrics. During this evaluation, 50% of the users were assigned to a baseline algorithm which directly ranks candidates according to the scores generated by the machine-learned model, and 50% of the users were assigned to the representative ranking algorithm. In total, hundreds of thousands of recruiters were included in the A/B testing and validation work that went into this change. We have found out that the representative ranking approach caused no significant change in our business metrics, such as the number of inMails sent or accepted, meaning that ensuring representation does not negatively impact our customers’ success metrics.
Based on the results of our validation of the representative ranking approach, we decided to ramp the representative ranking to an even wider audience. Currently, 100% of LinkedIn Recruiter users worldwide are presented candidates which go through the representative ranking process, helping providing equal economic opportunity for all the members of the workforce.
We have taken an important step towards the vision of creating economic opportunity for every member of the global workforce, but realize this is an ongoing effort.
There are many workstreams at LinkedIn that are focused on providing others with tools and insights to help realize this vision, including self-service online learning, bringing greater salary transparency, working with the World Economic Forum to bring greater attention to the global gender gap, and working with outside researchers via the LinkedIn Economic Graph Research Program. We are also engaged in a variety of efforts to ensure that our own software engineers, product managers, and company leaders better understand machine learning and its potential applications.
Work of this nature is only possible with a multidisciplinary team of folks across multiple teams and in different roles. Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi were the AI and applied machine learning researchers for this project. We would like to thank the many LinkedIn software engineers that were involved in the infrastructure components of this project, especially Gurwinder Gulati, Chenhui Zhai, and Yani Zhang for their direct contributions. We are also grateful to Patrick Driscoll and Divyakumar Menghani for their contributions from an analytics perspective. We would like to thank our product manager, Rachel Kumar, for guiding the product vision for representative talent search at LinkedIn.
We would also like to thank (in alphabetical order) Deepak Agarwal, Erik Buchanan, Patrick Cheung, Gil Cottle, Nadia Fawaz, Joshua Hartman, Heloise Logan, Lei Ni, Ram Swaminathan, Ketan Thakkar, Janardhanan Vembunarayanan, Hinkmond Wong, Lin Yang, and Liang Zhang for their help and fruitful discussions during the development of the product.