Completing a member knowledge graph with Graph Neural Networks

Jaewon Yang

Principal Machine Learning Engineer at Pinterest

December 1, 2021

Co-authors: Jaewon Yang, Jiatong Chen, and Yanen Li

Introduction

LinkedIn members are able to enrich their profiles with information about themselves, like professional history, education, skills, and so on. From members’ inputs, we use AI models to extract their profile attributes or profile entities. This process is called Standardization and Knowledge Graph construction, and produces a knowledge graph of the entities a member is related to. This is a key part of understanding member profiles so that we can find more relevant jobs, news articles, connections, and ads for members on the platform. As part of this process, we also aim to infer “missing” profile entities that are not extracted in the existing knowledge graph. For example, if the member knows machine learning and works at Google, we can infer that the member is skilled in Tensorflow, even if their current profile does not say so.

There are a few reasons why there will always be missing entities. First, most entity extraction technologies primarily rely on textual information. If an entity is not explicitly mentioned in the text, the models are likely to miss the entity. Second, the member may not provide complete information. For example, the member may choose not to list all the skills they have and include only a subset of their skills on the profile. If we infer these missing entities, we can serve better recommendations to the members across LinkedIn products. For example, we can show more relevant jobs, news articles, and people they may know.

Inferring missing entities is challenging because it requires a holistic understanding of the member profile. Current entity extraction methods involve using text as the primary input and cannot infer entities that are not explicitly mentioned in the text. To address this issue, we aim to leverage the entities that we’ve extracted from member inputs to infer the missing entities. For example, we would like to use the entities “Machine Learning” and “Google” to infer “Tensorflow.” Here, the challenge is to take into account the interaction of multiple entities. There are simple statistical methods to find related entities from a single entity—for example, pointwise mutual information. However, if we pick related skills from “Google” only, we would likely end up with other inferred skills, like “MapReduce” or “Android,” which are less relevant than “Tensorflow” in this example.

In this blog post, we’ll discuss how we created a novel model, leveraging Graph Neural Networks, to overcome this challenge.

Our method

Our approach is to formulate entity inference as an inference problem on a graph. Figure 1 is a visualization of our graph formulation. Solid lines are existing entity neighbors of the given member and dotted lines are potential new neighbors that are not explicitly mentioned on the profile. We aim to predict new neighbors given existing neighbors, which can be understood as a standard link prediction problem in a graph setting.

illustration-of-member-entity-knowledge-graph

Figure 1. Member-entity knowledge graph. Entities connected to the member with solid lines (with ID as suffix) are existing entities on their profile. Entities connected to the member with dotted lines (with “unk” as suffix) are “unknown” entities to be inferred from our model.

To solve the link prediction problem, we leverage Graph Neural Networks. Graph Neural Networks (GNN) are a class of neural networks designed to extract information from graphs. Given an input graph, GNN learns a latent representation for each node such that a node’s representation is an aggregation of its neighbors’ representations. Through this process, the representation learned by GNN captures the connection structure in the input graph. In our setting (shown in Figure 1), our GNN would learn representation for “company_unk” using its neighbors (member and its entities) and then we would use the representation to predict what company this would be; i.e., we aggregate the information from existing entities to infer a missing entity.

It’s important to note that existing GNN models have a gap in aggregating the neighbors (member entities), as they rely on simple aggregation methods such as averaging or weighted averaging. If there are complex interactions among existing entities, these simple aggregation methods would fail.

To address this issue, we developed a novel GNN model, which we call Entity-BERT. This model uses a multi-layer bidirectional transformer for aggregation. Given a set of existing entities, we apply a neural network, called Transformer, which computes the interaction (attention) between every pair of entities to update a node’s representation. It repeats this operation 6-24 times to capture increasingly more complex interactions among entities.

Multi-layer bidirectional transformers have shown superior performance in sentence understanding in Natural Language Processing (NLP), where the goal is to understand the interactions between words in a given sentence. In particular, Bidirectional Encoder Representation with Transformers (BERT) has outperformed other non-Transformer neural networks in various NLP tasks. We believe that BERT can also improve the performance of entity inference. The architecture of our BERT-like aggregator is shown in Figure 2.

Figure 2. Transformer aggregator. Inputs are entities neighbors of a given member. Output E[CLS] embedding corresponds to the member.

Training and inference

The model is trained with self-supervision. Given a member profile, we mask, or hide, a few attributes from their profile, and learn to predict the masked attributes. We replace 10% of the entities with [MASK]’s for each member profile and group them by their types (skill, title, company, school, etc.). An example is shown in Figure 3, where one company and one skill are masked out. Inspired by BERT from NLP, we also indicate entity types as an additional input, and assign each type a type ID. For example, company → 1, industry → 2, skill → 3, title → 4.

diagram-of-training-with-self-supervision

Figure 3. Training with self-supervision.

During scoring/inference, we attach a few masked entities to members’ profiles and specify the type of each mask. An example is shown in Figure 4. Here, the member already has standardized entities such as Title_9, Company_1337, Industry_6, Skill_198, and Skill_176. We want to predict one hidden skill for this member. So we attach a [MASK] entity with the skill type. Then the model will output a skill at the same position as the [MASK].

Figure 4. Inference pipeline.

Results

Application 1: Skills recommender
Our skills recommender system recommends skills that a member might have but did not mention on their profile. It’s triggered when members click “add a new skill” under the skill section of the profile, as shown in Figure 5. It is also presented to new members when they create a new profile (as shown in Figure 6).

Figure 5. Suggested skills when a member clicks “add a new skill” button on their profile.

screenshot-of-guided-edit-for-suggested-skills

Figure 6. Guided edit for suggested skills.

We use Entity-BERT to infer and recommend skills that are not on the member’s profile. When we compared Entity-BERT with a previous method that used a member’s current entities with simpler aggregation, we observed that, with the Entity-BERT-based method, members accepted more suggestions. We also observed that these added skills led to more member engagement overall, such as more sessions.

Application 2: Ads audience expansion
LinkedIn advertisers specify their target audience through profile attributes, e.g., show my ads to AI engineers. In addition, some of them opt in for audience expansion, where we expand the audience to other members with a similar entity, e.g., show my ads to AI engineers AND AI researchers. We use Entity-BERT to expand member profile entities (companies, skills, and titles) and use these expanded entities for audience expansion. In an online A/B test, audience expansion with Entity-BERT showed statistically significant impact on Ads revenue without hurting user experience metrics (such as Ads Click-Through-Rate) compared to a previous expansion model without the Entity-BERT.

Conclusion

In this post, we introduced Entity-BERT, a novel graph neural network for inferring missing member entities from the current member knowledge graph. Entity-BERT’s innovation is to apply multi-layer bidirectional transformers to capture interactions among existing entities. Entity-BERT has shown that it can infer missing entities effectively and deliver significant product impact.

Acknowledgements

Jiatong Chen, Sufeng Niu, Rui Wang, Yanen Li, Jaewon Yang, and Jacob Bollinger contributed to developing Entity-BERT models. The Standardization team developed the AI models and the A/B experiments were conducted with the Standardization team and the Ads AI team at LinkedIn.

Topics: Artificial intelligence Graph Systems Machine Learning