Extracting skills from content to fuel the LinkedIn Skills Graph

Co-authors: Sofus MacskassyLu SunDi Zhou, Rui Kou and Zhuliu Li

Skills are at the heart of every professional's qualifications for a role or new opportunity. At LinkedIn, we see a future where the world of work is centered on a skills-first economy. Adopting a skills-first approach will be especially critical as the requirements for roles, businesses, and industries are rapidly changing amid the current generative AI (GAI) boom. 

That's why at LinkedIn, we want to help our members and customers embrace a skills-first mindset to create a more efficient and equitable world of work. What this looks like practically is an organization putting a candidate's skills at the center of their hiring decisions and creating learning opportunities for employees to develop skills to advance their careers. This will unlock outsized opportunities for both professional and business growth. While having a skills-first mindset won't happen overnight, we are focused on building tools to help our customers and members put skills first as they hire, learn, seek and share knowledge on LinkedIn.

At the heart of how we help companies do this is the LinkedIn Skills Graph. It’s our foundational technology that underpins how we help members find new job opportunities, learn new skills, and determine which skills might be helpful when evaluating opportunities, so the skill repository we create must be comprehensive, consistent, and accurate. In previous blog posts, we shared how we built the skills taxonomy behind our Skills Graph from the more than 41,000 skills across our platform. 

In order to make skills-based experiences as robust as possible, it’s important that we map all of the skills on the platform to our Skills Graph. Doing so is straightforward for skills explicitly listed on member profiles or in job descriptions. It can be far more complex for skills embedded in content, such as LinkedIn Learning courses, resumés, and feed posts.

In this blog, we’ll examine how we use AI to extract skills from various content sources across LinkedIn and map these skills to our Skills Graph. This work helps us build a more powerful Skills Graph that provides better matching and relevance across jobs, courses, feed posts, and more for members and customers. It also helps ensure that we’re using the latest skills insights across LinkedIn, which is essential as the world of work and the skills powering it are rapidly changing.

Skill extraction and our mapping model stack

Throughout LinkedIn’s ecosystem, there are places where it may be difficult to extract relevant skills. For example, many LinkedIn members add skills to their profiles, but not always within the dedicated skills section. Sometimes they’ll include skills in their Summary and Experience sections, or in resumes. We don’t want to miss out on these skills, so extracting them from the text is essential to generate dependable skill insights. 

In addition to skills on profiles, job postings on LinkedIn can sometimes lack a comprehensive list of required skills, especially if they are sourced from online job postings outside of LinkedIn. There are also LinkedIn Learning courses where only a subset of skills are tagged directly, with a number of relevant skills being mentioned solely in the description or in the course itself (i.e., in the dialogue that we can get from the transcription).

Extracting these skills from text is a multi-step process. Where and how a skill is mentioned can provide a significant signal on how relevant the skill is and how we should interpret the mention of the skill. Skills can be mentioned explicitly – “expected skills for this job includes programming in Java” – or indirectly – “you are expected to know how to apply different techniques to extract information from data and communicate insights through meaningful visualizations.” 

AI is integral to this process. By utilizing machine learning models, we can extract and map skills from diverse content sources and collect feedback for continuous model improvement and member value. To do this, we first need to segment out large pieces of text (such as job descriptions and resumes) into meaningful parts. We can then remove the mentions of skills from each piece of the text. Once extracted, we normalize them into canonical/single representations (i.e., “data analytics” and “data analysis” are the same type of skill), which we represent in our skills taxonomy. We also need to pay attention to where a skill sits in a piece of content and what type of content it is. Skills are often represented differently in resumes, member profiles, or job descriptions, so we fine-tuned our models to learn the specifics of those types of content.

To address these nuances, we built an architecture and platform that addresses the challenges of extracting skills and mapping them onto the LinkedIn Skills Graph. The following diagram illustrates the AI model workflow that extracts and maps skills from raw text such as job postings.

Diagram of a workflow for skill extraction and mapping
Figure 1. Workflow for skill extraction and mapping

Skill segmentation

Before extracting any skill, we first parse the raw input into a well-formed structure. A job posting, for example, may have sections for “company description,” “responsibilities,” “benefits,” and “qualifications.” Meanwhile, a resume will usually have sections that describe skills and past experiences. A skill tagged in the qualifications portion of a job posting is more likely to be important than a skill tagged in the company description section, so with a defined structure in our AI model, we can better understand the listed skills.

Skill tagging

Once the unstructured raw input is properly parsed, a skill tagger identifies the mentions of skills in the text. The skill tagger can perform token-based skill matches and also perform semantic-based skill matches from short sentences/phrases. So, if the text says, “experience with design of iOS application,” we can infer that the phrase maps to “Mobile Development.”

The token-based skill matches are done with a trie-based tagger, which encodes the skills names from our skills taxonomy into a trie structure and performs a token-based lookup on the raw text input (as seen in Figure 2). The benefit of this tagger is that it scales very well with large volumes of text inputs as it runs extremely fast. One potential drawback is that it is dependent on the skills taxonomy to capture every different expression of a skill in the real world.

Diagram of a text skill tagger
Figure 2. Text skill tagger

To complement the trie-based skill tagger, we also developed a more semantic approach, where we trained machine models to understand the contextual information of any given text. The AI model behind this work is a two-tower model based on large language model (LLM) text encoders such as Multilingual BERT (see Figure 3). Multilingual BERT is used to build the contextual embedding for source text and skill name, and the two-tower structure is designed to decouple the generation of sentence and skill embedding while still keeping them comparable with the given similarity function.

Diagram of a semantic tagger
Figure 3. Semantic tagger

Skill expansion

Starting from the set of tagged skills, we expand the skills further to bring in relevant skills to increase the chance of skill matches. The skill expansion relies on our Skills Graph to query for relevant skills in the same skill group or skills that share structural relationships, such as parent skills, children skills, and sibling skills.

Multitask cross domain skill scoring

When all of the relevant skill candidates have been discovered, a multitask scoring runs to identify each content piece and skill candidate pair. The multitask scoring model contains two parts, a shared module and a domain-specific module, as shown in Figure 1.

Shared model

In the shared module, we designed a Contextual Text Encoder and a Contextual Entity Encoder. The Contextual Text Encoder incorporates any available text information for each content piece and skill candidate pair. The text information could be a short phrase that mentions the skill, the surrounding sentences or paragraphs, a job title, or a member’s most recent job on their profile. For the Contextual Text Encoder, we use Transformer at this stage, as it has shown to outperform other language models on a variety of language understanding tasks and has high capability to capture contextual information by design.

The Contextual Entity Encoder utilizes pre-calculated skill, title, industry, geographical, and other entity embeddings to provide entity level context for each content piece and skill candidate pair. Manual features, such as the co-occurrence rate between entities, are also included. 

Domain-specific model

In the domain-specific module, we have multiple dedicated model towers from each vertical (job posting, member profile, feeds, etc.). The model towers are designed and developed independently, but they all share the same text and entity based contextual information, coming from the above shared module. The assumption is that entities and text are available information in each skill vertical and the extent to which they affect skill extraction is similar. Any vertical specific sources of information can also be included at this step. Having separate model towers that each skill vertical independently owns allows each skill vertical to maintain flexibility to respect the nuanced differences in skill understanding.

Serving models at scale

Extracting and normalizing skills from content is only beneficial if those extractions can be used by LinkedIn members, products, and AI systems (including search, recommendations, feed ranking, Jobs You May Be Interested In, Job Search, Recruiter Search, and many others). Skills show up in many types of content and need to be consumed by many systems, so we need to have the right infrastructure to service our models and data at scale to ensure LinkedIn runs smoothly.

Diagram of serving of member skill extraction on production system
Figure 4. Serving of member skill extraction on production

Model serving in production environments necessitates strict adherence to inference time service level agreements (SLAs) in both online and offline/nearline contexts, while respecting computational resource constraints.

For example, LinkedIn's standardized member profile skills feature requires nearline inference when a member profile is created or updated. With approximately 200 global profile edits per second, each message must be processed in under 100 milliseconds. Serving a full 12-layer BERT model on a platform like LinkedIn, while maintaining latency standards, is a daunting task even for industry leaders since BERT, though powerful in NLP, has a large parameter count and is computationally demanding. 

Serving the model in nearline while meeting latency requirements with the original 12-layer BERT model is complex. However, recent research indicates that large models may underutilize their capacity, suggesting possible model size and inference time reductions without sacrificing performance. Among various model compression techniques, we opted for Knowledge Distillation

Our goal is to transfer knowledge from a larger teacher network to a smaller student network, training the student network to replicate the teacher network's behavior. To do this, the team uniquely balanced performance and model complexity. For online serving, Knowledge Distillation reduces the model size by 80% without compromising performance, given the existing Samza-BEAM CPU serving constraints.

For full data reprocessing, the team collaborated with the Waterloo and Samza-Beam teams to develop Spark offline scoring capabilities. Simultaneously, we optimized cost-to-serve and devised a groundbreaking hybrid solution by utilizing offline resources for reprocessing and nearline processors for nearline traffic.

Addressing stringent requirements for both online and offline/nearline scenarios while adhering to inference time SLAs necessitates inventive solutions that balance performance and model complexity. Knowledge Distillation and hybrid approaches using offline and nearline processors exemplify such innovations that yield optimal results.

Product-driven model improvement

To further improve the quality of skills extraction and mapping, we build feedback loops directly into online Job Posting and member profiles to help with AI model iterations:

Recruiter skill feedback

When a recruiter manually posts a job on LinkedIn, a list of skills, pulled by our AI model, is suggested after they fill in the posting content. A recruiter can edit this list depending on if they believe a skill is important.

 Image of skill recommendation from posting jobs online
Figure 5. Skill recommendation from posting jobs online

Seeker skill feedback

When a job seeker opens a job posting on LinkedIn, a feature will show how many skills overlap between their profile and the job. In general, the higher the overlap, the more likely an application will be successful. We also provide the opportunity for the seekers to review the top 10 skills used for skill matching calculation and if a certain skill is irrelevant to the job, seekers can provide feedback. With these signals, we can identify the job skills relationship from the job seekers’ perspective and use that information for model improvement.

Image of UI that shows skill match between job seeker and hirer
Figure 6. UI that shows skill match between job seeker and hirer

Member profile skill feedback

LinkedIn Skill Assessments (SAs) are adaptive assessments designed by LinkedIn Learning experts to evaluate and validate skills across a range of domains. These short-form assessments are accessible through the profile skills section, where members can click on the "Take skill quiz" button to access a list of SA recommendations. Upon successfully passing a SA with a 70th percentile or higher score, members are awarded a “verified skill” badge that they can display on their profile page and is visible to recruiters. After taking the SAs, members receive personalized recommendations for next-step actions based on their assessment results and profile information. These recommendations may include LinkedIn Job Postings, Learning Courses, and other SAs and help us ensure that the member’s listed skills are accurate, with which we can further improve the skill extraction models.

Image of UI for Member Skill Assessment
Figure 7. Member Skill Assessment
Applications across LinkedIn

Career relevant skills

LinkedIn's skill extraction capabilities are critical for creating a member-skill graph with heterogeneous edges that enables a deeper understanding of our members and customers on the platform. 

After the skills extraction, we collect contextual skill data and job application data to identify the most important and relevant skills for a member's career. This career-relevant skills data recommends relevant job opportunities to members or provides better candidate suggestions to recruiters.

Another notable application of skill extraction is skill proficiency estimation. While it is easy to incentivize members to list their skills on their LinkedIn profile, estimating their expertise in those skills is more challenging. Our approach involves a multitask learning framework with an uncertainty weighting scheme incorporating signals from multiple contexts. By leveraging this approach, we can infer a member's expertise in a certain skill after the skill is extracted for the member and enrich the skills graph with more dimensions.

Through this approach, we can also identify the most important and relevant skills for a member's career. From there, we apply skill proficiency estimation to fully leverage this information, create opportunities for every member of the global workforce and achieve a strong understanding of the skills relevant to our members. 

Job important skills

Building an accurate and comprehensive skill profile for jobs is the foundation of our skills-first initiatives. Not only should skills be extracted correctly, but we also need to identify which skills are more important to the role than others. 

To generate robust importance scores, we need to capture the content-skills relationship from multiple perspectives, instead of one single angle, such as the skills that are mentioned in the content. So in addition to the “mention/valid” relationship, we define the “required” relationship and the “core” relationship to further increase the content skill understanding resolution.

  • Relationship “required:” the skill is explicitly mentioned as the requirement by the job description contents.
  • Relationship “core:” the skill is essential to fulfill the job's basic functionality, regardless of if it’s stated in the description or not.

A skill importance score is an aggregation from the above multiple relationships prediction scores.

Developing a separate model to identify each relationship from scratch is inefficient. It doesn’t scale well for further resolution improvement, so we leveraged the modeling workflow mentioned in the AI Model section and learned multiple content-skills relationships simultaneously through a multitask learning framework. The trained multitask model can achieve better performance that is not only reflected in the offline evaluation, but also shown through A/B testing on the product job recommendation, job skills matching, etc. The following are some performance improvements that were realized with this approach: 

  • Job Recommendation 
    • Member Job Applicants and Offsite Apply Clickers: +0.1391%
    • Predicted Confirmed Hires: +0.4606%
  • Job Search
    • Job Sessions: +0.1468% 
    • PPC revenue: +0.7577%
    • Engagements: +0.2271%
  • Job Member Skills matching 
    • Qualified Applications: +0.87%
    • Qualified Application Rate: +0.40%
    • Predicted Confirmed Hires: +0.24%
    • Applicants and Apply Click Counts: +0.48%
Forward Thinking

We are heavily investing and continuously improving our skill understanding capabilities in several new and exciting directions. One such approach is to leverage LLM models to provide a rich description about every skill in our Skill Graph. In addition, we can fine-tune the LLM to improve the performance of the skill extraction ML model, which can help us to generate high quality proxy labels at scale. Another direction is rather than relying on exact skill text or ID match, we want to push to leverage embedding as the de-facto skill representation, so that we can perform more semantically relevant matches in downstream models for skill relevance matches. 

The skill mapping technologies we built at LinkedIn put our Skills Graph at the center of powering skills-first transformation across our platform and the world of work. Without a robust tech stack for mapping content to the Skills Graph, it would be just a static list that would go outdated as time passed. Instead, we’re able to constantly update and evolve the Skills Graph to stay up to date on the always-changing skills landscape.