Job Flavors at LinkedIn: Part I
August 25, 2016
We believe LinkedIn can offer personalized insights into job opportunities in a way that no other website can. On our new Job Details pages, we can show members valuable information, like who they might know at the company, that could give them an edge over other applicants or make one particular job stand out over others. The biggest problem we face, however, is getting members to that Job Details page; the job may be a perfect fit for a candidate, but if they never see the page, they miss out on a tremendous opportunity.
What we needed was a way to pique a job seeker’s interest and answer the question, "What one thing is most interesting about this job to this person?" Initially, LinkedIn had plain vanilla lists of jobs, like a list of search results, or a list of jobs recommendations. There was nothing to make a job stand out besides the job title or company name. What we wanted to do in the newer version of this experience was to surface all the “flavors” of what makes a job special—for instance, if the member would know fellow employees from previous jobs or school, or if the job represents a jump in salary—in order to help job seekers narrow down the field and find the best job for them. In this post, we will talk about the infrastructure we built for showing these Job Flavors to our members.
When first designing this system, we had three requirements:
Low latency: Many parts of LinkedIn where we would like to show Job Flavors have strict latency requirements, like the feed or job search. After talking with stakeholders, we determined that we could go no higher than 50ms at a 95th percentile (i.e., 95 percent of calls should take under 50ms).
Scalable: We needed to build the system in a way that would scale both in terms of the size of data (unique insights for our more than 450 million members multiplied by more than 15 million jobs) as well as being able to support a large number of flavors. In order for us to show flavors everywhere we show jobs, we would have to support at least 5,000 queries per second (QPS), with 10,000 QPS at peak traffic times. We also needed plenty of room to grow beyond this, as we’ve seen a 75 percent increase in views for all job pages since last year.
Extensible: Adding new flavors (or removing old ones) should be easy to do. We wanted this to be a core part of the job seeker experience going forward at LinkedIn, and for it to be simple for new developers to pick up and maintain. Adding new flavors should not require knowledge of every other flavor that came before, nor knowledge of all of the moving pieces (tracking, building the models, ranking, etc).
The job flavors ecosystem is made of two main components: a stateless Rest.li layer, and a set of datastores accessed by the Rest.li layer on each call. By using Rest.li + D2, we can infinitely scale this middle tier while keeping fault tolerance and load balancing (okay maybe not infinitely, but pretty much for all practical purposes). Frontend clients make requests to our middle tier with a list of jobs, a member, and a set of Job Flavors to check against. Clients are explicitly required to request the flavors they want, otherwise whenever a new flavor is added, we might suddenly start sending it back and that client wouldn’t know how to display it.
Next, for each flavor requested, the list of jobs is sent to a corresponding “flavor delegate” that fetches necessary flavor metadata for that member. Each delegate implements a simple interface with two functions: 1) return information about what flavor it handles, and 2) given a request, return whether its flavor is valid and return any additional metadata related to the flavor.
This leaves the actual implementation of fetching the flavor information totally open. As different teams at LinkedIn come up with new ideas for flavors, we don’t want to constrain the technologies or information systems used. As a consequence, this forces the system to be highly decoupled, as delegates must be able to operate independently of any others. Each delegate knows its own dependencies and doesn't need to rely on implementation details of other delegates. That being said, some delegates have common dependencies, and it would be nice to avoid making duplicate calls when we could make one batch call to a downstream service and then split the information back to each delegate that requested it. For example, one flavor might need the member's current school information, while another needs their current company information, both of which could be fetched via a call to the profile service. Luckily, we can simply embed parseq-batching when making requests to the downstream services, and it will be smart enough to merge overlapping requests for us. Delegates don't have to do anything to make this work. They merely use the client as they normally would and they get efficient batching with the rest of the delegates for free.
Once all the dependencies for a delegate have been obtained, the delegate will make a call to a datastore to fetch information about its Job Flavor. As each delegate finishes (or is timed out as a safety precaution for misbehaving delegates), the valid Job Flavors are fed into a ranking and selection engine. We use two heuristics to score and rank flavors for a given job: member affinity and job flavor affinity. Member affinity is how this member has historically reacted to each flavor. Flavors a member has shown interest in the past will be ranked higher than ones they have ignored or dismissed. Job Flavor affinity is how strong the flavor is for that job: a company that hires dozens of people from a member's current company, for example, may be more compelling than if the member has a single connection at the company. Once each tuple of (member, job, flavor) has been scored and ranked, the highest-ranked flavor is selected and returned in the response. Both clients and the server emit tracking events that are used to continue building these models. When new flavors get added, they will automatically merge into this data life-cycle and require no changes by the developer who added the new flavor delegate.
Once the framework was constructed, all that was left was to figure out the optimal way for us to fetch these insights. We first turned to the graph team: this team manages the API for accessing the entire social graph for LinkedIn. The graph is updated in real time and enables clients to make incredibly complex queries to target exactly what they need. This comes with a price, however; while our initial requirements for scalability and extensibility are met, speed is not—these computationally intensive operations take time, especially for large subgraph intersections, which can stretch into the high hundreds of milliseconds. For example, determining everyone who has ever gone to school at Arizona State University who now works at IBM requires an intersection of two graphs each with ~400,000 members that only results in ~500 people.
What we realized is that, while the real time aspect is nice, we don't really need it for most flavors, as people aren't changing schools and companies every day. It's also much cheaper to verify correctness than doing on-the-fly discovery. If we can precompute these insights requiring massive intersections via a daily Hadoop job and store them for quick lookup, we can verify the information we retrieve is still correct by a quick lookup. For example, in Voldemort (our distributed key-value datastore), we can store that a member’s connection works at some company and we can verify this information isn’t outdated with a quick member lookup. This is orders of magnitude faster than doing the graph intersections on the fly, typically requiring only a few milliseconds in total. Some insights (like jobs that have a small number of applicants) inherently require real-time data backing them, and for these we use Pinot, which excels at analytics-related insights and has a relatively low latency (tens of milliseconds).
Results and next steps
We are currently using this system to decorate jobs in a variety of places, like our job emails, the job search page, the Jobs home page, sponsored company updates, and recommendations in the feed. In the near future, we will integrate this into other advertising offerings, recruiter inMails, and more. After ramping up this feature, we saw incredibly encouraging results, including a 15% increase in views from feed and emails, a 5% increase in job applications and messages sent between members when coming from job search, and even a revenue increase from sponsored company updates.
Currently, the Job Flavors platform supports four flavors:
- How many people a company has hired from your school
- How many people a company has hired from your company
- How many connections you have at this company
- Whether the job currently has less than ten applicants
While we do have many more flavors in the works (including flavors applicable to guests and others unique to members with premium subscriptions), our primary future work is integrating job flavors more deeply into both search and recommendations. We are currently working to allow job searches to be filtered by flavors, and fully categorizing our job recommendations “Netflix-style”. We are planning on making the system more robust by putting in automated real time alerts to detect any abnormal drops and rises in how often flavors are displayed or clicked through.
A huge thanks to Kunal Cholera, Jiuling Wang, and Kaushik Rangadurai for their help in planning and initial architecture design, as well as all our partners in Feed, Companies, Jobs, Messaging, and Search for bringing this great feature to the forefront of LinkedIn. If you’re interested in building great products like this and impacting people’s lives every day, we are always looking to hire talented engineers and data scientists. Check out our careers page to learn more.