Pensieve: An embedding feature platform

Benjamin Le

Curious Engineer. Avid Learner.

October 14, 2020

Co-authors: Benjamin Le, Daniel Gmach, Aman Grover, Roshan Lal, Jerry Lin, Austin Lu, Qingyun Wan, and Leighton Zhang

Feature engineering is foundational for building artificial intelligence (AI) that powers products at LinkedIn. Recently, “representation learning” or “feature learning” has started replacing manually engineered features, as they provide state-of-the-art performance. This blog post describes Pensieve, the embedding feature platform developed for the Talent Solutions and Careers product to pre-compute and publish entity embeddings. Pensieve embeddings are trained with supervised deep learning techniques and used by ranking models in latency-sensitive applications across Talent Solutions and Careers.

Introduction

The goal of the LinkedIn Talent Solutions and Careers team is to build an efficient marketplace for job seeking members and employers by matching members to job postings that can lead to hiring. To accomplish this, we built a diverse set of products such as the core search and recommendations modules for LinkedIn Jobs and LinkedIn Recruiter as shown in Figure 1. Each product uses multiple AI models in tandem to produce the final results on the page. Due to the many AI models that must be built to support this diverse product ecosystem, it is paramount that we have a portfolio of effective features that are continuously improved to lift model performance across the board.

product-mocks-for-job-search-and-recruiter-search

Figure 1: Example job search module (left) and recruiter search module (right)

We create effective features by leveraging techniques in supervised deep learning to train models that produce entity embeddings. Although representation learning through deep learning algorithms has produced state-of-the-art performance in academia and industry, forward propagation can be compute-heavy for latency-sensitive applications. This trend in computation is only accelerating as network architectures such as BERT begin to use hundreds of millions of parameters. Therefore, the burden of entity embedding inference must be pushed from request time computation to nearline or stream pre-computation where there is no strict SLA.

In light of the above goals, we introduce Pensieve, our embedding feature platform to pre-compute and publish entity embeddings. These embeddings are produced by running forward propagation on models trained using supervised deep learning techniques.

Pensieve platform

Our platform can be divided into three main pillars, illustrated in Figure 2.

Offline Training Pipeline: Training data generation and distributed training is streamlined by our infrastructure to allow modelers to focus on applying deep learning theory in practice. Here, we focus on agile experimentation when scaling training to hundreds of millions of instances. We can easily join millions of observations with any sparse features from our Frame Feature Marketplace through a few lines of configuration code, while distributed training is enabled through Tensorflow On Yarn (TonY).
Pensieve Modeling: Here, we train neural networks that can take sparse features about our entities and effectively encode them to semantic embeddings in low-dimensional space. Most of our iteration cycles are spent on applied research here to improve embedding quality.
Embedding Serving Framework: Once trained, the neural networks are packaged for embedding serving. We set up parallel offline and nearline embedding serving pipelines for the multi-model computation needed for A/B testing. These pipelines publish the pre-computed embeddings to our Feature Marketplace for consumption by other AI models.

Figure 2: Architecture of our platform, divided into pillars

In this post, we highlight the novel subcomponents of this platform, taking a deep dive into the Pensieve Modeling and the Nearline Embedding Serving Framework.

Pensieve model

Model input
The LinkedIn Knowledge Graph is an important source of features across all AI models at LinkedIn, defining relationships between entities such as members, uploaded resumes, job postings, titles, skills, companies, and geolocations. Building the relationship edges using explicit information provided by members and through inference models is known internally as “Feature Standardization.” Titles, skills, companies, and geolocations related to members, uploaded resumes, and job postings are anonymized to id values and used as input sparse categorical features.

However, company and geolocation features are of extremely high cardinality, often in the millions. Training on those large dimensional inputs results in larger models and slower convergence. We combat this by subsetting the features before use, leveraging the observation that many of the job/member feature pairs tend to co-occur. For example, people of a particular geolocation often prefer to apply for jobs in a nearby location with lucrative opportunities. We can model these co-occurrences as a weighted bipartite graph G: (U, V, E) where

U := Set of values of a member feature (e.g., member geolocation)

V := Set of values of the corresponding job feature (e.g., job geolocation)

E := {(u, v, w) | u ⍷ U, v ⍷ V, w = fraction of co occurrence of (u, v) feature values}

From this bipartite graph G, we choose the subgraph G’(U’, V’, E’) by optimizing

formula-defining-subgraph-g-u-v-e-such-that-u'-and-v'-are-less-than-or-equal-to-n

n is chosen to be on the order of tens of thousands to limit the subsetted U’ and V’ features. We approximate the solution using greedy methods since the general case of this problem, densest k-subgraph, is NP-hard.

Network architecture
The Pensieve model is a deep neural network inspired by Deep Structured Semantic Models (DSSM). The idea is to learn semantic concept vectors, or embeddings, for entities to be matched that are used to compute relevance scores. DSSM achieves this by passing features of each entity through a deep neural network (DNN) to produce corresponding embeddings that are used to compute relevance scores via cosine similarity.

In our case, we want to match job seeking members to job postings. When adapting DSSM to these entities, we ended up with the following architecture in Figure 3.

Figure 3: Resulting architecture when adapting DSSM to Pensieve use case and entities

Member and job posting features each flow through a respective multilayer perceptron (MLP) of N layers in order to produce an embedding. Having separate paths for members and job postings is advantageous for serving these embeddings at scale, since we can independently precompute embeddings for new members and job postings.

We found this network to be useful, but it quickly demonstrated limits in scalability and performance as we added more layers. This motivated us to introduce skip connections, propagating all inputs from prior layers to the next layer, as shown in Figure 4.

diagram-depicting-skip-connections-in-the-architecture

Figure 4: Example of skip connections created through concatenation of prior input layers

This change resulted in a better performing model that converged faster, due to feature reuse and the creation of shorter paths for gradients during backpropagation, as shown with DenseNets. This architecture is known internally as a “pyramid block” because the MLP networks widen with each hidden layer, resulting in a topology that looks like inverted pyramids.

improved-architecture-showing-the-pyramid-block

Figure 5: Architecture with pyramid blocks for encoding member and job posting embeddings

The final prediction is a logistic regression on the Hadamard product between each seeker and job posting pair being scored. More formally, if we denote the output y with embedding inputs x_m and x_j for a given member or job posting respectively, then this can be written as y = σ(w ⋅ (x_m ⊙ x_j) + b) where σ denotes the sigmoid function, w is the weight vector, and b the bias term. We chose the Hadamard product over more common functions, like cosine similarity, to give the model flexibility to learn its own distance function, while avoiding a fully connected layer to reduce scoring latency in our online recommendation systems.

The resulting evolved architecture from DSSM with pyramid blocks is shown in Figure 5.

Model deployment
The final trained model is broken into two subgraphs: one for the member pyramid and the other for the job pyramid. The subgraphs are versioned, packaged, and distributed into our serving framework to pre-compute embeddings for members and job postings independently.

Nearline embedding serving framework

System architecture
The system architecture was designed with two main priorities in mind:

Efficient output: Minimizing the amount of entity embedding writes to our feature marketplace is an important downstream system concern. We minimize writes by batching multiple versions of embeddings into one write message. Furthermore, many data updates to entities do not meaningfully change the entity embedding value. Thus, we don’t write embeddings if a data update doesn’t change the embedding value.
Experimentation velocity: The ability to quickly experiment with new embeddings models is key for productivity. When an embedding model is ready, it takes only a one-line change to serve the new embedding version.

We use Apache Beam in Samza for our embeddings nearline pipeline for its simple, expressive API for defining data processing pipelines. Figure 6 shows the end-to-end nearline embedding serving flow for job posting entities. Our flow contains the following stages.

Feature standardization processors run in separate nearline processes to produce the input sparse features required by our embedding model whenever a job posting is created or updated.
Because each feature standardization processor is independent, we perform a stream-stream join of the processors’ messages together in order to indicate when the entire standardization process is completed.
When the entire standardization process is completed, parallel multi-model embedding computations are done in the ExecuteScoringFn, for all registered models.
Computed embeddings are deduped against the current embedding values in our key-value store for write efficiency in the DedupeEmbeddingsFn.
Embedding versions for the same entity are batched and formatted for output to a Venice key-value store and Kafka topic to be published to the Feature Marketplace.

flowchart-showing-the-stages-of-feature-standardization-processors

Figure 6: Beam nearline job posting embedding serving flowchart

System performance and robustness

System optimization
It is critically important that every nearline system is scaled sufficiently to handle the incoming message rate. If we fall behind in handling the incoming message rate, consumers of our entity embedding features will suffer from using stale or missing embeddings to compute ranking scores. This staleness deteriorates the effectiveness of ranking models and consequently impacts members.

There are two primary factors that could lead to message processing delay. First, the processor might simply not be tuned properly to keep up with peak incoming message rate. Second, a downstream dependency such as Venice may be in a bad state, causing message processing to be stuck.

For the former, we did the following to maintain high throughput:

Increased parallelization of run loop stages across different tasks by increasing the thread pool size of job containers.
Increased JVM heap size and disabled heap resizing to reduce JVM pausing by GC and heap expansion, respectively.

For the latter, we designed our multi-data-center strategy to be robust to downstream dependency failures that are out of our control for high availability. This is discussed in more detail in the next section.

Multi-data-center strategy
Initially, our nearline system only processed the job postings in the data center that these jobs were created in. Consumers accessed output embedding messages for jobs by listening to the aggregate Kafka cluster, as illustrated in Figure 7. Though it avoids duplicate computation across all data centers, this approach is fragile when dependent services in a data center are unhealthy. When this happens, all consumers across all data centers will be unable to access the embeddings of jobs created in the problematic data center during downtime.

diagram-showing-how-a-multi-data-center-strategy-is-leveraged

Figure 7: Multi-data-center strategy optimizing for avoiding duplicate computation

To guarantee high availability in the aforementioned scenario, the multi-data-center strategy is changed to the one shown in Figure 8.

diagram-showing-the-improved-multi-data-center-strategy

Figure 8: Multi-data-center strategy optimizing for high availability

Now, the consumers process embeddings of all the jobs in every data center, regardless of origin. This aligns with how other critical nearline systems at LinkedIn are architected to isolate the impact of a failure within a single data center. If there is a problematic data center affecting embedding serving, members can simply be traffic shifted to avoid this data center until recovery.

Impact of Pensieve and what’s next

Initially, we evaluated Pensieve embeddings feature importance by training XGBoost models for job recommendations ranking and using the built-in feature importance APIs. When including the Pensieve embedding, we can see that the feature takes a supermajority of the total feature importance originally consumed by sparse title, skill, seniority, and location features, as shown in Figure 9. This gave us confidence in the efficacy of our embeddings and to continue development.

graphs-comparing-xgboost-without-and-with-embedding

Figure 9: Pie charts to show the relative feature importance of features with and without Pensieve Pyramid Embedding

To date, we have iterated through six versions of Pensieve embeddings published to our feature marketplace. We see encouraging adoption of Pensieve embeddings across Talent Solutions and Careers products. Each iteration of Pensieve embeddings or new product integration has resulted in statistically significant single-digit percentage improvements in each product's key metrics.

In future versions of Pensieve embeddings, we are exploring integrating state-of-the-art but computationally expensive pre-trained language models such as BERT. This will enable us to better incorporate raw text data as input features during embedding training. Furthermore, we envision these entity embeddings as foundational pieces for representing members’ job-seeking activity in the embedding space. We are inspired by techniques presented at KDD 2018 by Airbnb to consider achieving personalization by averaging the embeddings of jobs that a member has interacted with as a feature in a model.

Acknowledgments

The success of this work is due to collective action across the AI, Application, and Horizontal data infrastructure teams at LinkedIn. In particular, we would like to thank the following folks in alphabetical order for their direct contributions to Pensieve: Daniel Hewlett, George Pearman, Girish Kathalagiri, Jeffrey Lee, Joshua Hartman, Junrui Xu, Kevin Kao, Kunal Punera, Roshan Lal, Qing Duan, Samaneh Moghaddam, Suju Rajan, Suman Sundaresh, and Yu Gong. We would also like to thank the many other Talent Solutions and Careers engineers who have helped Pensieve through their work and feedback integrating Pensieve embeddings into their relevance systems.

Topics: Artificial intelligence Data Product Design Member/Customer Experience Machine Learning