From the Economic Graph to Economic Insights: Building the Infrastructure for Delivering Labor Market Insights from LinkedIn Data

June 2, 2023

Authors: Dr. Patrick Driscoll and Akash Kaura

LinkedIn’s vision is to create economic opportunity for every member of the global workforce. Since its inception in 2015, the Economic Graph Research and Insights (EGRI) team has worked to make this vision a reality by generating labor market insights such as:

Real-time economic and workforce intelligence & insights. This takes the form of the monthly LinkedIn Workforce Report, newsletters on LinkedIn.com, working papers, and flagship reports about timely issues such as the green transition to address climate change.
Sharing economic data with the government and multilateral partners. The Data for Impact program (DFI), whose partners include the World Bank, Inter-American Development Bank (IDB), International Monetary Fund (IMF), Organization for Economic Co-operation and Development (OECD), Destatis (Germany’s statistical authority), and the United Nations Development Program (UNDP), provides researchers the opportunity to leverage LinkedIn data to inform cutting-edge research, program design, and investment strategy.
Sharing economic data and commentary with media such as CNBC, The Wall Street Journal, NBC News, Financial Times, etc. so their audiences can stay up to date on timely issues such as remote work, the gender gap, and climate change.

In this post, we'll describe how the EGRI Data Foundations team (Team Asimov) leverages LinkedIn's cutting-edge data infrastructure tools such as Unified Metrics Platform, Pinot, and Datahub to ensure we can deliver data and insights robustly, securely, and at scale to a myriad of partners. We will illustrate this through a case study of how we built the pipeline for our most well-known and oft-cited flagship metric: the LinkedIn Hiring Rate.

Growth and Technical Pain Points

We have seen incredible interest in the insights that the EGRI team can provide. Over the last eight years, we have quadrupled the number of partner teams we work with - going from two major stakeholders in 2015 to eight in 2023 (see image below). As you can imagine, delivering such a wide variety of insights across multiple channels comes with technical challenges.

Graphic that growth of stakeholders for the Economic Graph

One of the main challenges is ensuring that our data scientists have reliable data available in a timely manner. Publishing insights based on inaccurate or stale data can result in a loss of trust from our partners, and has the potential to cause further negative impact. For example, if a media outlet uses incorrect data from an Economic Graph report in their reporting, it could result in a loss of trust among their readership.

We currently address over 50 requests for our data and insights per month. These requests require interaction with one or more of our datasets, and the timelines can range from a few days to a couple of weeks. In addition to this, we also conduct proactive research to develop new metrics such as Labor Market Tightness. Our data infrastructure must be able to handle a high volume of requests from a wide range of consumers, and we need to ensure our data are available as often as possible so that we can quickly turn around on analysis requests from critical partners. As the popularity of LinkedIn and the demand for insights into the economy and labor market continues to grow, we must ensure that we can scale our output to meet that growth.

We also must ensure that in all of our work, we are appropriately protecting our members' privacy. LinkedIn’s members rely on the platform to keep their data secure, and it is essential that the EGRI team takes appropriate measures to ensure that member privacy is protected at all times. This requires us to carefully manage the data which we collect and use and to leverage secure data infrastructures for storing and processing the data.

Foundational Team Vision and Guiding Principles

To address these challenges, we assembled the EGRI Data Foundations Team (Team Asimov) with a charter of developing and managing our data ecosystem. The team operates with the following guiding principles as our collective north star:

1. Availability: Data must be readily available to the broader team in order to support their research and analysis. This data must be accessible and available in near real-time in order to accurately reflect current trends and conditions.

2. Reliability: Data consumers need to be able to trust that the data they are using is reliable and, as a result, can then be confident in the quality of the analyses they are generating.

3. Discoverability: Consumers must be able to easily discover and access the correct data sources for their needs, whether it is stored in a centralized repository or scattered throughout the wider data landscape.

4. Governance: A robust data governance framework must be in place to ensure that the data are being used appropriately and that any potential risks to member privacy are identified and addressed.

5. Accordance: State-of-the-art data infrastructure technologies and tooling are not sufficient to fully realize our vision. It is critical that we secure broader team buy-in through the demonstration of value through mechanisms such as piloting, quarterly reviews, ongoing support process, and so on.

In the next section, we demonstrate how this ecosystem works together to bring our Hiring Rate metric to life.

Case Study: LinkedIn Hiring Rate

The pipeline for serving our LinkedIn Hiring Rate (LHR) metric is a prime example of our use of tools and operating principles to scale our methodology company-wide, and then externally.

For LHR¹, we need to take in data from across the LinkedIn ecosystem including (but not limited to) data on our members’ profiles: their work positions, geographic locations, and the companies they work for. To make sense of this data in a structured manner, we rely on our Knowledge Graph team’s work on the construction of various entity taxonomies (titles, companies, geographies), and understanding of entity relationships to build LinkedIn’s Knowledge Graph which powers all our products and services using state-of-the-art AI systems.

Making LHR available for company-wide usage

Once these data sources have been processed and prepared for use by our upstream partners, we pass the prepared data into our Unified Metrics Platform (UMP) for metrics computation, leveraging Apache Spark to provide high performance and fault tolerance. Publishing LHR on UMP allows for the metric to be leveraged across the organization, feeding potential use cases such as analyses in partnership with media outlets and policymakers, or as a possible feature in future AI model development. Darwin, our unified “one-stop” data science platform, allows Data Scientists on our team to interact with this data via different query and storage engines, for exploratory data analysis and visualization of LHR metrics.

To ensure the quality of our metrics, we leverage Data Sentinel, which allows us to quickly deploy data assertions for testing input and output data validity as well as for automatic identification and alerting regarding anomalous data. Further, as part of the UMP configuration, we make LHR available on our internal Retina²Pinot cluster to allow for easy charting and dashboarding of this metric. This allows us to communicate monthly updates to the metric alongside month-over-month, year-over-year, etc. comparisons to our partners, often in a self-service manner.

Graph of the US Hiring Rate from LinkedIn's Economic Graph

LHR is featured in our monthly Workforce Reports

Once the publishing process is complete, our metric and the associated datasets are all discoverable via DataHub, our metadata management platform. Data consumers can discover our dataset via the platform, understand the schema and fields relevant to their use case, get permission to access it, and see who to contact in case they have questions all in one place.

DataHub also provides us with a user-friendly interface to monitor metadata and the overall health of our dataset. This is particularly useful for the Asimov team to see dataset health over time at a glance quickly.

Making LHR available for external usage

Earlier, we highlighted the use of LHR by Destatis as part of their dashboard. Thanks to exciting developments in the works, we expect to be able to address that use case amongst many others from our DFI partners via an API (see diagram below). UMP allows us to seamlessly ingest LHR data into Pinot which in turn provides us real-time analytics infrastructure.

The insights gained from the hiring rate can be used to identify areas of growth and decline in the job market, as well as to understand the types of skills and experience that employers are looking for in candidates. For example, if the hiring rate for a particular industry is low, it may indicate a lack of demand for workers with certain skills or experience. On the other hand, a high hiring rate in a particular location may indicate a strong job market and a need for workers in that area. This information can be extremely valuable for our partner organizations in policy research, report development, investment allocation, etc. In partnership with our K2 engineering team, we are in the process of developing an API that will allow trusted partners to query LHR data for such use cases.

Diagram of EGRI Hiring Rate Delivery System

The Asimov team has limited resources, and here, the Accordance principle comes into play. Having a clear understanding of the relative prioritization of metrics and datasets, with the buy-in of the full team, allows us to direct resources to the most critical areas. To ensure alignment within our team as well as with other teams, we publish our prioritization principles, curate lists on DataHub, and review these lists quarterly to ensure freshness.

Screen image of Data Hub showing Metrics Options

Finally, we keep the full EGRI team abreast of our journey to robust data foundations through retrospectives and socialization of learnings during quarterly reviews.

Next Steps and Acknowledgments

While we’ve accomplished an incredible amount, there’s so much more work on the horizon. We’ll continue to onboard our metrics onto our foundational ecosystem. There is exciting work ahead on integrating LinkedIn’s cutting-edge differential privacy tools into our data stack. We are also working on developing high-performance data flows to unlock stronger collaboration with government organizations like Destatis, and multilateral organizations like IDB, OECD, and the World Bank through DFI.

We want to thank Cristian Jara-Figueroa and Nikhil Gahlawat for their unwavering support in crafting the Project Asimov strategy and bringing it to life with tireless execution; Casey Weston, Paul Ko, and Rosie Hood for their partnership and actionable feedback through their work on our Data for Impact program and advocacy for more robust data foundations; the K2 team for their critical work to enable API driven future plans; the entire Economic Graph Research and Insights team without whom Project Asimov would be mere words on paper, and our partners in Policy, Communications, and Editorial teams for always being patient with us throughout this journey.

¹LinkedIn Hiring Rate is the percentage of members who added a new employer to their profile in the same month the new job began, divided by the total number of members in the United States (or a given country). The number is indexed to the average month in 2016 i.e., an index of 1.05 indicates a hiring rate that is 5% hiring than the average month in 2016.

²Retina is an internally developed reporting platform, custom fit to LinkedIn’s data visualization needs for use cases such as LHR.

Topics: Analytics Economic Graph Infrastructure Graph Systems