Quality matches via personalized AI for hirer and seeker preferences
July 30, 2020
Our vision at LinkedIn is to create economic opportunity for every member of the global workforce. Key to achieving this is making the marketplace between job seekers and hirers more efficient. Active job seekers apply for many jobs, and hear back from only a few. At the same time, hirers (recruiters, hiring managers, business owners looking to recruit a new employee, etc.) are flooded with applications with limited time to screen each, and as a result, they often overlook qualified candidates. This inefficiency frustrates both sides.
To address these problems, we built the Qualified Applicant (QA) AI model, which learns the kinds of applicant skills and experience that a hirer is looking for based on the hirer’s engagement with past candidates. We use the model to help our members find jobs for which they have the best chance to hear back from, and to reduce the likelihood of our hirers overlooking promising applicants by highlighting those who are a great fit.
Several challenges exist in solving this problem. One is creating a model that is effective for all job seekers and hirers, done with model personalization. Another is to acknowledge that individual job searches and job postings are transient, and that personalized models will go stale if not updated regularly. Another challenge is the scale at which personalized models must be trained. The QA model has billions of coefficients, so scalability in training is paramount.
In this blog, we discuss how we have overcome these challenges with an in-depth look into training data, personalization, training scalability, serving infrastructure, and automated retraining. We also discuss key findings about what’s important to making a system like this successful. Finally, we showcase applications of the model in three Linkedin products—Premium, Job Search, and Recruiter—and note metric wins to illustrate the efficiency gains from the increased investment in a simpler baseline system.
High-level overview of the Qualified Applicant model
Qualified Applicant is an AI model that aims to predict how likely a member is to hear back if he or she applies for a particular job. Formally, we try to predict the probability of a positive recruiter action, conditional on a given member applying to a given job.
What constitutes a positive recruiter action depends on the specific context. This can include viewing an applicant’s profile, messaging them, inviting them to an interview, or ultimately, sending them a job offer.
Per-seeker and per-job level personalization
Personalization is the attempt to predict actions and create product experiences tailored to individual users. While personalization can be rule-based, machine learning approaches are more effective when sufficient data is available. However, since the LinkedIn job market is large and varied, a single (global) machine learned model can be sub-optimal to capture unique idiosyncrasies.
In such cases, we individually adjust this single global model for each member and job, with per-member and per-job models, each trained on data specific to the member and the job.
where xm and xj are member and job feature vectors, fglobal is a global model, fm (xj) is a per-member model trained on the jobs that the member applied to and similarly, fj (xm) is a per-job model trained on members who interacted with this job. The model belongs to the class of generalized additive mixed models.
We used linear models for fm and fj, but one can use any model in the above formulation as long as the produced scores are calibrated to output log-odds (for example, a neural net). Usually, linear models are sufficient as per-member and per-job components, as individual members and individual jobs do not have enough interactions to train more complex non-linear models.
These three components—global, per-member, and per-job—are trained in a loop (using Photon ML and the following algorithm). Each of the many per-member and per-job models is independent of each other within a single training iteration conditional on the scores produced by other components, making each iteration embarrassingly parallel and hence, easy to distribute.
While the global model is trained on all data, each per-member model is trained using only that member’s recent job applications, and each per-job model on that job’s recent applicants. For our approach to work, it is essential to have enough density of data across members and jobs.
Our analysis demonstrated that the majority of job applicants apply to at least 5 jobs, while the majority of job postings receive at least 10 applicants. This proves to result in enough data to train the personalization models.
The personalized QA model improved the offline evaluation metric, the area under the ROC curve (AUC), by +27%. We observed similar improvements in NDCG (normalized discounted cumulative gain) metrics when learning to rank. Our baseline was the previously deployed gradient boosting tree model that was trained on the same dataset and the same features. We also estimated that the contribution of per-member and per-job models varies greatly with the use-case (see the “Applications and Impact” section below).That is, for some datasets, per-member models drive most of the gain and for some datasets, per-job models are more important.
Continuous learning via automated daily updates
The problem of model freshness
Models trained on high-velocity data can go stale quickly, and may require frequent retraining. This is the case with the QA model’s personalization components, which are trained on hirer engagement labels. In fact, the advantage over the baseline model for a per-member personalized model halves after only three weeks without updates. For per-job models personalization, this decay is even faster. Frequent updating is necessary to maintain the highest possible performance gain over the baseline.
Decay of AUC metric in a personalized model (teal) compared to a baseline (red), when the personalized model is not updated.
This is expected since the baseline learns global data patterns and does not degrade with time. However, individual applicants may only spend several weeks actively job seeking, and new positions may be filled within weeks or even days. We have a very short window of opportunity to learn and leverage per-member and per-job patterns while they are still active.
Automated retraining on up-to-date job interaction labels
In order to combat this decay, we need to frequently update the per-member and per-job model components. The global model does not need frequent retraining. Every day, we generate fresh training labels, automatically retrain the per-member and per-job components, and deploy them to production.
Every day, hirers engage with new candidates. Our pipeline picks up these events to create fresh training labels. Ideally, these events are turned into training labels quickly to reduce the latency with which new information is incorporated into the model. However, in practice, we must wait until either a hirer engages with a candidate (a positive label), or wait some arbitrary length of time before inferring an implicit negative label. A common approach is to wait some time after an application (say two weeks), and then label it positive or negative based on the presence of engagement. While this is simple, such a long wait negatively affects the performance of the QA model due to the speed at which this data goes out of date. However, based on our analysis of a typical recruiter response, we are able to identify many of the positive labels much sooner—30% become known within a day.
The share of applications that received their first positive engagement on a given day upon being submitted—the likelihood of receiving a positive engagement is highest on the first day and drops soon after
To solve this problem, we implemented a fast approximate label collection pipeline. It uses all explicit positive and negative feedback as soon as it becomes available, while heuristically inferring implicit negatives based on the context of the job posting. For example, if a recruiter responds to other applications submitted later, we may infer the negative label for the application with no engagement. We make the negatives conclusive if no engagement is seen after 14 days. This means we can train the QA model on an ever-improving approximation of the dataset.
Near real-time learning
The pipeline described above operates offline with batch training. It takes up to a day to generate new labels, and retrain and redeploy the personalized model components.
Our technical vision is to reduce this from hours to minutes. We are developing a near real-time data collection and training pipeline built with stream processing technologies including Apache Samza and Apache Kafka. This system aggregates recent interactions to asynchronously train and update individual personalized model components. Instead of a latency of hours to update the QA-like models, this will allow efficient data-driven updates within minutes.
System design and infrastructure
We have described previously how the personalized model components are retrained daily in a highly parallel manner, resulting in billions of coefficients stored on Hadoop HDFS. In order to serve the model online, we deploy the model to an online service. While the global model is small enough to fit in memory on a single server, the accumulated total of personalized components are far too large for this. Fortunately, we can leverage the model structure to parallelize storage and retrieve only the subset of coefficients that are necessary for scoring a given request.
We store per-member and per-job model coefficients in Venice, LinkedIn’s distributed online key-value store. In a typical request, we compute QA scores for a single member and a set of jobs or for one job and several members. Relevant personalization coefficients can be retrieved with a single batch query to the key-value store and are cached by the online service. Together with the global model kept permanently in the service’s memory, these are used for scoring.
To keep the QA model fresh, we retrain it daily with newly collected engagement data. If the retrained model passes automated quality checks (e.g., better evaluation metrics on the validation set when compared to the model from the previous retraining), we update the coefficients in the key-value store. To reduce training time, we initialize the coefficients with their current values and update them selectively. For example, since the global model does not change as fast as per-member models of active job seekers, we only retrain the global coefficients once every few weeks.
The system components are summarized in the diagram:
Qualified Applicant system design and infrastructure, summarized with offline training, daily retraining, and online serving components
Applications and impact
We have deployed this model across three LinkedIn business lines—Job Seekers, Premium, and Recruiter—and observed significant metrics gains in all three. On the seeker side, we highlight job search results if a member’s profile is a good match for the job (Quality Match product). For Premium members, we additionally showcase jobs for which the member would be more competitive than the other applicants (Top Applicant product). Finally, hirers using LinkedIn Recruiter benefit from a smarter ranking of applicants, as well as receive notifications when members with a very high match score apply for their jobs.
In the Recruiter use case, we use the personalized QA model to rank applicants in the candidate management system. The new personalized model provided double-digit gains in hirer interaction rate such as resume download and positive recruiter rate. We also use the new QA model to selectively send recruiters a notification when a qualified applicant applies, providing double-digit increase in click-through rate (CTR) for impressed notifications.
For the Job Seekers use case, we use the QA model to highlight jobs for which a member is strongly qualified. This resulted in a site-wide lift of confirmed hires demonstrating quality match considering both hirer and seekers sides of the marketplace.
Finally, in the Premium use case, we provide Premium subscribers with insights on how they rank against other applicants based on the QA prediction score. In this use case, we improved premium job seeker CTR.
We are grateful to our partners for helping us with model deployment and for valuable discussions. In no particular order: Sanjay Sachdav, Sean Miao, Jinqi Huang, Joseph Wang, Nadeem Anjum, Rohan Punamia, Raveesh Bhalla, and Deepti Patibandla.