Building LinkedIn Talent Insights to democratize data-driven decision making
June 29, 2020
Co-authors: Timothy Santos and Jeremy Lwanga
LinkedIn is a mission-driven organization, and we take our mission of “connecting the world's professionals to make them more productive and successful” very seriously. One way we’ve worked toward this goal is through LinkedIn Talent Insights, which first launched in 2018. This is a tool that helps organizations understand the external labor market, their internal workforce, and enable the long term success of their employees. Talent Insights accomplishes this by providing unparalleled access to LinkedIn’s Economic Graph, providing metrics that allow its users to make data-driven decisions in developing their talent strategy.
The Economic Graph consists of 690 million members, 36,000 skills, 50 million companies, and 90,000 schools. Through mapping every member, company, job, and school, Talent Insights can help users answer questions like: What is the fastest growing skill in the software industry? Which schools did company X’s employees graduate from? Which companies is company X losing its talent to? Users can further explore the data by filtering on various dimensions like geographic locations, member skills, member titles, and years of experience. From the engineering perspective, these questions are answered with OLAP operations such as slicing, dicing, drilling down, and rolling up.
Figure 1: Talent Insights Talent Pool Report
When we started developing Talent Insights, we had three priorities in mind:
- Deliver data on demand: give human resources (HR) leaders and talent professionals the ability to answer complex talent questions in minutes.
- Make the insights actionable: ensure that anyone can interpret the data. You don’t have to be a data scientist to get immense value from the product.
- Harness real-time updates on LinkedIn: provide the most accurate view of labor market trends at any given moment.
In this blog, we will talk about how we solved the challenges we faced in building Talent Insights from an idea into a full-fledged LinkedIn product. We will highlight the process, decisions, and optimizations that have allowed us to scale and ultimately democratize access to LinkedIn’s unique data.
Given that Talent Insights was the first platform of its kind at LinkedIn, there were many questions and challenges that immediately came to mind:
- How do you build a product with OLAP functionality that’s also fast, accurate, and reflective of near real time LinkedIn data?
- What datastore are we going to use?
- Can we precompute these metrics and store them in a key-value database?
We realized it was not scalable to precompute metrics, given the astronomical number of combinations of search dimensions across skills, titles, locations, industries, and functions. To help illustrate, if you wanted to compute a metric for all the possible combinations when picking 10 skills from the LinkedIn skills taxonomy of 36K skills, you would produce a dataset of 1.0062805 x 1039 records!
We also decided against leveraging a search infrastructure like Galene (Lucene-based index), as it was better suited for matching entities and ranking based on relevance scores than for aggregating metrics. In addition, Galene approximates counts for queries that match a large number of documents to help performance, which was not suitable for Talent Insights, as approximated counts would lead to inconsistent data within the product.
Leveraging Apache Pinot
With the requirement for serving accurate counts at low latency, we naturally gravitated towards LinkedIn’s very own Apache Pinot. Pinot is a distributed OLAP store designed to return queries at low latency. Pinot uses Pinot Query Language (PQL), which is similar to SQL, in that they both support selection, projection, aggregation, and grouping aggregation, which works well for generating the metrics on Talent Insights. For example, if we wanted to find the top 20 companies with software engineers who had Spark as a skill, our query would roughly look like:
SELECT COUNT(member) FROM myTable WHERE title = “software engineer” AND skills = “spark” GROUP BY company TOP 20
Once the Pinot cluster was built, we created an ETL (Extract, Transform, Load) pipeline that ran daily for populating our Pinot tables (see Figure 3). Specifically, we extracted standardized data (member, company, job) and engagement data from Hadoop, then a Spark job transformed the data with specific business logic, and finally a Pinot build and push job loaded the data into Pinot.
Figure 2: High-level depiction of the ETL pipeline
Once the first dataset was pushed to Pinot, we did some initial capacity testing and quickly discovered that our Pinot cluster could only serve a limited number of concurrent queries before reaching a network timeout. This was nowhere close to what we needed, considering the number of users we were anticipating for this future product. Given a fast-approaching launch timeline, we knew we had to learn quickly and iterate.
Build, observe, optimize, iterate
In order to get to a place where we were comfortable handling production traffic, we had to think holistically about our strategy for optimizations. After observing the performance of our queries, we decided to optimize across three areas:
- Query optimization
Figure 3: Simplified view of the Talent Insights online architecture
We started off by building a dedicated test performance cluster. By testing and benchmarking on this cluster, we were able to learn about the bottlenecks and limits with our Pinot setup in conjunction with our datasets and queries. From initial experiments, we learned that we were hitting scaling limits with Pinot’s out-of-the-box settings. Therefore, we worked closely with the Pinot team to test and enable advanced features to improve scalability. For example, one early observation we made was that our query throughput would not scale up, even though we were adding more Pinot brokers and servers. At the time, the Pinot cluster was configured such that it scattered and gathered requests to all servers for every query, even if some servers did not serve a data segment required for answering the query. To optimize, the Pinot team implemented a replica group segment assignment strategy, which pruned the number of Pinot servers the broker had to communicate with and, as a result, freed up more Pinot server resources by reducing the incoming QPS (queries per second) by the factor of the number of replica groups. Switching to this configuration provided a better way to tune the broker/server query fanout and allowed our Pinot cluster to be horizontally scalable; the query throughput could then scale proportionally to the amount of hardware in the cluster.
Figure 4: Illustration of querying Pinot servers before/after replica group assignment strategy
Another observation we made when analyzing query performance was that Pinot requests would often be dropped when serving the bursty requests needed for powering the metrics on Talent Insights. We noticed that this occurred when all the Pinot broker connections were occupied when communicating to the Pinot servers. When a Pinot broker received a query, it would check out connections from the thread pool for each server it communicated with and hold them until the responses were returned by the servers. While waiting for the query to complete, all the connections would be unusable by subsequent queries, resulting in an inefficient use of threads. To address this, the Pinot team implemented asynchronous broker routing, in which broker connections to the servers would no longer be held while waiting for the response. After this change was rolled out, there were no longer any dropped requests from Pinot, even with high QPS/bursty traffic.
Several other changes were made to the Pinot infrastructure that ultimately helped support the high read throughput use case for Talent Insights, such as the addition of the VALUEIN transformation function, upgrading the cluster hardware to SSD machines, and supporting tenant-level isolation. By planning and coordinating with the Pinot team, we were able to parallelize these optimization efforts with the ones made at the query and data layer.
It was imperative to be responsible Pinot clients and optimize queries wherever possible. We inspected all the different query patterns for powering the metrics on Talent Insights and identified the ones that were performing particularly slowly. The common trait between the slow queries was that they scanned a high number of records to process the query. In database terms, this is related to selectivity, or the ability of a query to narrow the results using the index. To ensure selectivity, we had to write queries that limited the number of possible documents when using the column index and which would, as a result, lead to faster aggregation in Pinot.
With this in mind, we made significant efforts to make our queries as selective as possible. For example, if we wanted to get the companies that employ the most software engineers in the United States, we added a filter to exclude the long tail of small companies that don’t need to be returned to the frontend. Furthermore, when querying top skills of a talent pool, we removed skills commonly held across many job roles like “leadership” and “Microsoft Word” from the dataset, which significantly reduced the amount of data to be aggregated.
Here’s an example of a query with a higher selectivity value. ~72M members on LinkedIn have “management” as a skill.
SELECT COUNT(member) FROM myTable WHERE skills = “management”
Here’s an example of a query with a lower selectivity value. ~3,000 members on LinkedIn have “quantum field theory” as a skill.
SELECT COUNT(member) FROM myTable WHERE skills = “quantum field theory”
We also realized there were many duplicated queries when generating the reports on Talent Insights. Queries would sometimes be repeated for metrics that appear across multiple locations within the application. Adding a Pinot query result cache (Couchbase) tremendously helped performance and reduced overall Pinot QPS.
User interface optimizations
Our optimization strategy of minimizing the number of Pinot queries influenced the design of the Talent Insights user interface. For instance, rather than firing queries with every change to the search filter, we waited for the user to signal completion of their search facets by hitting an “Apply” button before making any query to Pinot. This helped cut down on unnecessary queries for intermediate results that a user might not care about. To further reduce the application’s load on Pinot, we made sure that the rows in the tables in the product were “lazily loaded”. Given that the tables typically showed 10 results on each page, we made sure our APIs implemented pagination so that we did not fire queries for subsequent results until the user clicked on the next page, which greatly reduced the number of queries needed to load the table initially.
Collaboration: “Relationships matter”
The LinkedIn mantra of “relationships matter” once again proved true in building Talent Insights. Many different teams had to come together to bring this tool to life. It was key to establish a strong working relationship, especially when working with teams across LinkedIn’s different campuses. We had frequent syncs with partner teams to make sure expectations were set and people were held accountable for completing their tasks.
Getting engineering teams to support a product that has not yet been built can be tough. Why should I spend my resources building something that currently has no users or generated revenue? Those times are the most crucial for leveraging and building relationships, which are the true foundation of your tech stack. They are also the foundation for your future collaboration.
Although we were hacking away at optimizations into uncharted territory, we knew we were paving the way for other teams, with hopes they would reap the benefits of our work. There were times of frustration and doubt, but we pushed through. Looking back, we can confidently say our work paid off, as other teams have benefited from our learnings and success. The Pinot optimizations that were motivated by Talent Insights have helped other teams with their own use cases.
Dream big, but appreciate the journey
It is a rare opportunity to get the chance to build out a brand new enterprise product from scratch. With all the chaos that comes with tight deadlines, it is easy to get so caught up in work that you forget to “stop and smell the roses.” It is important to take a step back and remember that you’ll get there one step at a time.
We knew that we weren’t going to immediately support the QPS of our predicted customer usage with one magical optimization. But it is important to dream big and set specific, measurable, attainable, realistic, and time-bound goals (S.M.A.R.T.) along the way. And with each goal/milestone attained, it is important to give kudos and celebrate. Take some time to appreciate your team and all who have supported you.
Talent Insights now
Long gone are the days when we were scrambling for more capacity in our Pinot cluster. Talent Insights has since scaled to serve over 2,000 companies across the world. Not only does Talent Insights serve requests for its own users, but it also now serves company-related metrics across various LinkedIn products, including Recruiter, Sales Navigator, Company Pages, and Premium Insights.
Building out Talent Insights was truly a collaborative effort that required many cross-functional partners. We would like to extend our special thanks to our partners on the Pinot and UMP teams who helped launch this product: Kishore Gopalakrishna, Ravi Aringunram, Prasanna Ravi, Jackie Jiang, Sunitha Beeram, Seunghyun Lee, John Gutmann, Dino Occhialini, Mayank Shrivastava, Shraddha Sahay, and Ameya Kanitkar, under the leadership of Kapil Surlaker. Finally, we would not be where we are today without the hard work of the entire Talent Insights Engineering team who brought this product to life.