Building Enterprise Software on LinkedIn's Consumer Stack: Behind the Scenes of LinkedIn Talent Hub
October 11, 2018
The Economic Graph is a digital representation of the global economy based on 575+ million members, 50,000 skills, 26 million companies, 15 million open jobs, and 60,000 schools. In short, it’s all the data on LinkedIn, and this data is used to mine critical information to connect members with economic opportunities. LinkedIn has been widely successful in leveraging this data to build products to facilitate economic opportunity through offerings like “Job Postings” and “Recruiter Search,” which have helped members find jobs and recruiters find potential candidates, respectively. And today, we have another exciting update.
In order to provide an end-to-end hiring experience, LinkedIn is pleased to launch Talent Hub. Talent Hub is a candidate sourcing and management tool designed to meet the needs of mid-sized companies. It uses the data in LinkedIn’s Economic Graph, along with a rich set of candidate management features, to give hiring teams complete visibility into every stage of the hiring process while providing utilities to aid that process. This includes the ability to post jobs, source candidates through Recruiter Search, and a variety of applicant management capabilities (like hiring pipeline management, interview scheduling and management, unifying candidate communication from different platforms, sharing candidate profiles for reviews, etc.).
An example view of the Talent Hub “pipeline” view for sourcing new candidates
In this blog post, we will discuss the foundational building blocks that were either built from scratch, or created by modifying existing platforms, to power LinkedIn’s new Talent Hub product.
Foundational building blocks
The sections below provide an overview of the different platforms we built and leveraged when creating Talent Hub.
Talent Hub is a highly flexible and scalable web app built on top of several different foundational systems. For example, it uses: our in-house Galene stack to power search in different parts of the application, a custom rules engine to power automation, core services like Hire Access Control to determine data visibility and access, our in-house Air Traffic Controller system for optimizing communication and notifications to the users, and a wide variety of external services integrations to enable working with third-party services.
As we expect the product to evolve quickly and want to “future proof” the system for many more outside integrations, etc., Talent Hub is built using the multi-vertical API tier approach. This architecture makes managing smaller vertical microservices easier, and thus iterating and developing on it faster.
Data storage and structure
It is very tempting to use a relational database to power the Talent Hub, since there are several instances where we need to query for a certain range of rows like “querying all candidates who are in stage ‘INTERVIEW’.” However, we decided to use LinkedIn’s in-house Espresso document-based key-value pair distributed data store for all Talent Hub data.
The following were the key criterion for doing this:
Although there are a few use cases for relational-like queries, largely the query pattern is fetching data associated with an entity, which typically manifests as a key lookup for most models. Large numbers of our queries fall into this pattern and hence NoSql was a more optimal fit.
Operational conveniences were also a factor, since it is easier to horizontally scale Espresso as our data grows.
Some use cases require querying batch data, which is usually difficult with NoSql databases. However, a key property of Espresso is that it allows querying for several rows using secondary indexes, as long as the rows being queried belong to one partition. This is because a partition is guaranteed to live in one node that uses a MySql database, which supports those kind of queries. We heavily leveraged this property of Espresso to support various queries to the database.
All Talent Hub users are within a dashboard ID that is set up at the time of creating an account. We use this dashboard (or “hiring context”) as the partition key for all tables. Espresso allows for defining a multi-part key, where the first part represents the partition key. All Espresso tables in the Talent Hub ecosystem have hiring context as the first key part. See below for two tables with descriptions and their respective keys:
HiringProject: Top-level container associated with a requisition used to source and manage candidates.
HiringProjectCandidate: Table containing the association between candidates (Prospect or Hiring Identity) and hiring projects.
The HiringProjectCandidate table’s document schema also contains an index specification for the “state” field that specifies which state the candidate is in. As a result, queries of the following type can be made: Select all candidates who belong to the hiringContext=123 and hiringProject=568 and state=’INTERVIEW’.
This allows us to use a distributed key-value pair database which scales to our needs, and at the same time allows for querying within a smaller chunk of data contained in a single partition.
Breaking the frontend API into verticals
Talent Hub has been the product of hundreds of engineers working together from various teams within LinkedIn. As with every engineering project, we want to avoid the law of diminishing returns. Therefore, every internet-facing API that our Ember and mobile application calls is its own microservice that makes no sideways calls to other microservices at the frontend tier. Instead, the frontend leverages an URN decoration framework that allows any referenced entity to be fully decorated when needed by the frontend (see Figure 1). By decoration, we mean resolving a reference to an entity and replacing it with data associated with that entity. For example, if the API containing candidates returns a reference to a LinkedIn member (a member URN), decoration would entail then expanding that referenced object to get firstName and lastName by calling an API containing data pertaining to that member.
Figure 1: All APIs are implemented as independent microservices with no direct dependencies. Decoration of URNs is performed in a separate layer inaccessible by the API application code.
When the frontend, for instance, needs to decorate a job posting URN returned in a search result from the search API, the frontend decoration framework automatically knows to route the decoration to the jobs API.
Finally, we avoid redundancy and code duplication by having a shared library containing infrastructure and common utilities used across the APIs. This has allowed us to scale horizontally and onboard new engineers quickly, because we avoid having a big monolithic code base and application. In the past, a monolithic code base has proven difficult to grow and scale to the size of our engineering organization.
The ability to scale to the large portfolio of LinkedIn’s sourcing customers is not limited to scaling services and data stores; it also includes providing the ability to create customized workflows in the Talent Hub that can be self-serve configured by users. This allows LinkedIn’s Talent Hub to cater to numerous different hiring workflow needs and processes, which can be very unique across different organizations. After evaluating a host of open-source, rule-based workflows, we decided to create our own REST-based, configurable rule engine, which allows for the level of customization we intend to offer and seamlessly integrates with LinkedIn’s infrastructure.
A Rule consists of three entities:
Each of the above are hosted by separate Rest.li services, which is LinkedIn’s microservices architecture platform. See this blog post for details. So, a Rule is basically an association between a Trigger, a list of Conditions (which can have AND or OR clauses between them), and a set of Actions to execute. As a result, we are able to automate hiring workflows by configuring rules of the following nature:
Note that in the above rule, each of the conditions mentioned inside “” are atomic entities that can be reused in different rules. This is possible because Condition is a Rest.li resource and each condition manifests itself as a Rest.li object, which can be reused inside different Rules objects. So with a given catalog of system Trigger, Conditions, and Actions, it is possible to define numerous automation workflows based on needs of a client. It is also possible to create a Condition or Action that is a webhook callback, in case there is more customization required than what the system can support.
The Trigger consists of possible state changes, which in LinkedIn’s ecosystem largely manifests as a Rest.li service call, a Kafka message (a distributed streaming platform, check this Apache project for details on Kafka), or a Brooklin message (data change capture system, see this blog post for details on Brooklin). Figure 2 below shows the sequence of events involved in executing a rule.
Figure 2: Rules engine architecture
External services integration
One of the core capabilities of the Talent Hub is its ability to work with other systems like HRIS, background check providers, interview schedulers, and assessment service providers, etc.
LinkedIn has a platform called GaaP (Gateway as a Platform) which is on a mission to turn the entire internet into one big Restful framework. With a Groovy-based scripting framework, GaaP allows developers to create a Rest-endpoint for external APIs and write it in a standardized format. This allows us to internally create a Rest API on top of services such as Google G-Suite and the Microsoft Graph API with the exact same models and methods, even though these APIs are radically different from each other in the real world.
Using these internal yet external APIs, we can easily write integrations that power core Talent Hub features such as interview scheduling and automatic synchronization of recruiting-related emails.
We use GaaP heavily in LinkedIn’s Talent Hub for two core features:
Integrating with email providers to support “One Inbox” where recruiters can aggregate their communication on different platforms into one place.
Supporting automated interview scheduling. This requires sending invites to interviewers and keeping track of their responses. GaaP allows a user to query these external calendaring services and easily provides an integrated scheduling experience for our users.
In addition, we also built the Extensions Framework, which allows third-party apps to integrate with the Talent Hub and become a third-party service provider in LinkedIn’s Talent Hub. An example of this integration is using Skype for interviews. The architecture of the Extensions Framework leverages LinkedIn’s partner push platform and Rest.li gateway to allow for a flexible webhook-based integration with custom parameters which can be used to call a third-party service from the Talent Hub. Talent Hub’s Extensions Framework exposes a set of core APIs for different supported use cases. Consider the “interview assessment” use case; in this instance, Talent Hub will expose a core “/assessments” API via a Rest.li gateway, which the third-party services can write into after the interview is conducted. Initiation of the interview assessment process is done by calling a webhook that the third-party service provider provided at the time of configuring the integration.
It is critical that the Talent Hub is able to support a large spectrum of third-party service providers across different candidate management capabilities, and the only way to make this happen in a scalable way is to build an infrastructure which is easy to scale to all future externalization use cases. Extensions Framework is designed in this way because it has a set of generic core use case APIs.
Powering search with Galene
One way we are building delightfulness into the search experience of our Talent Hub is by leveraging LinkedIn’s existing search infrastructure framework, Galene. Originally built to power LinkedIn’s member, company, and job search, Galene allows us to bring powerful relevance, faceting, and super speedy search results over hundreds of millions of indexed records in just a few milliseconds. We use Galene to power several product features, such as:
Recruiter Search, a flagship offering from LinkedIn’s sourcing product ecosystem.
Searching through thousands of job applicants.
Searching through several projects in a single customer’s account with different filters and facets.
Notifications using ATC (Air Traffic Controller)
ATC is a platform built to improve the communication experience to LinkedIn members. It is described in detail in this blog post. One of the challenges with having all the communication to Talent Hub users go through ATC was that ATC is intended to be a platform for LinkedIn member communication and notifications. A user on LinkedIn’s Talent Hub is represented by an Enterprise Profile, which is a profile associated with a user who uses an enterprise product from LinkedIn. This user may or may not have bound their LinkedIn member profile with the Enterprise Profile. ATC originally did not support Enterprise Profiles as a recipient type for communication, but as part of building foundational blocks for Talent Hub, we modified the ATC system to support Enterprise Profile users. Now, all communication to LinkedIn Talent Hub users can be regulated by ATC’s processing features. Some of these features include:
Channel selection (Push, in-app, email, or SMS)
Following up in different channels
Capping, aggregation, and digesting of notifications
Delivery time optimization
Core platform services
We built a number of core services that are used throughout the Talent Hub. Two such notable services include:
Hire identity service
Hire access control service
Hire identity service
This service hosts identity information for a potential candidate in the hiring pipeline. It assimilates profile information from different sources, one of which is LinkedIn member data. It is totally acceptable to have candidates in the hiring pipeline who are not LinkedIn members, so these are typically uploaded by Talent Hub users by entering their information. If we are able to find a member after their information is entered, we will bind this profile with the member. Hire identity service is the source of truth for candidate data in Talent Hub and all of Talent Hub’s features use this identity to build candidate features.
Hire access control service
Talent Hub usually contains highly sensitive data, including candidate reviews, interview feedback, compensation information, etc. In some cases, the entire hiring project could be highly confidential because it is associated with a sensitive or strategic requisition. This requires the Talent Hub app to strictly enforce data visibility policies which are exposed through a centralized access control app. In Rest.li, a microservice is responsible for surfacing data owned by that Rest.li model and is expected to be the single source of truth for that data. By integrating all these Talent Hub related microservices with the hire access control service, we have been able to enforce visibility restrictions on Talent Hub data.
LinkedIn’s Talent Hub serves as a one-stop shop for all hiring needs, including sourcing and candidate management. Talent Hub is meant for mid-sized companies and is thus an enterprise offering from LinkedIn. However, unlike a lot of other enterprise software, Talent Hub is built in a way that provides consumer-like, delightful experiences to users.
The pilot program for Talent Hub begins in January, and it will be globally available later next year. We couldn’t be more excited to bring you Talent Hub and can’t wait to hear what you think.
This has been a massive initiative for LinkedIn and is a result of several quarters of dedicated effort from engineers belonging to the Hire and Careers organization. However, it would have been impossible to realize this vision without the leadership of Manish Baldua, Dan Reid, and Shen Shen, along with the dedication of the entire Talent Hub team.