A closer look at the AI behind course recommendations on LinkedIn Learning, Part 1

Sneha Chaudhari

Machine Learning Engineering Manager @ LinkedIn | CMU | IBM Research | IISc

June 17, 2020

Co-authors: Sneha Chaudhari, Mahesh Joshi, and Gungor Polatkan

LinkedIn Learning is a platform where LinkedIn members have the opportunity to develop the relevant skills needed to achieve their professional goals. The Learning AI team continually advances the large-scale content recommendation engine for LinkedIn Learning, serving over 690M LinkedIn members and customers with relevant courses based on their interests and learning aspirations.

The image below shows the desktop version of the LinkedIn Learning homepage. The learners are shown course recommendations on this page, with recommendations grouped by context annotations (i.e., high-level explanations about why a recommendation was made, such as “Because you’re interested in Decision-Making”) into carousels.

Over the last few years, the team has built the course recommendation engine from the ground up and evolved it to serve recommendations using hyper-personalized models that learn billions of coefficients for our millions of members (Shivani Rao et al CIKM 2019, Polatkan et al blog post). A key goal of this recommendation engine is to surface the most relevant and personalized course recommendations, which can help learners develop new skills and drive engagement on the platform.

In this two-part series, we’ll show how Learning AI is recommending relevant courses to our members and helping drive engagement by using state-of-the-art AI technologies. In part 1, we’ll share an overview of our recommendation engine design and then present a high-level explanation of the three main components of the engine. Later, in part 2, we’ll delve deeper into each of the engine’s components, providing insight into how we generate personalized course recommendations for every learner on the platform.

System overview

The overall architecture of the system is shown in the diagram below. There are two layers to our architecture: online and offline. The actual ranked list of personalized recommendations for each member is computed offline using the recommendation engine and stored in an online key-value store, queried at request time. In the online portion of the architecture, recommendations are delivered to the frontend via an online endpoint upon request. Whenever available, we also leverage online recommendations that are generated by using contextual information from the current learner session. The online flow ensures that the right recommendations are delivered for the right learners by leveraging the A/B testing framework developed at LinkedIn (Shivani Rao et al CIKM 2019).

diagram-of-the-online-and-offline-architecture

System design with offline and online architecture

Recommendation engine

Below is a high-level overview of the recommendation engine. It consists of three major blocks:

Response Prediction: This system predicts member-course relevance using the learner’s profile features (such as skills and industry) and course metadata (such as course difficulty, course category, and course skills). It uses historical explicit engagement (clicks, bookmarks, etc.) as the target response/label to train the model.
Collaborative Filtering: Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences from many users (collaborating). In our framework, this model leverages past implicit engagement data of learners (i.e., course watches) to identify relevant courses. We learn a latent representation for each learner and each course and use similarities between these latent representations to predict member-course relevance.
Blending: We blend the recommendations given by both of these algorithms to determine a final set of relevant courses for every learner. However, blending is an online component of the system, so the recommendations from each of the models are fetched upon request and blended according to the fixed selection probability assigned to them.

Essentially, the underlying machine learning problem is to predict explicit/implicit engagement of learners, given contextual information about the learners and courses:

high-level-diagram-of-the-recommendation-framework

A high-level overview of the Learning AI recommendation framework

Response Prediction and Collaborative Filtering have complementary strengths and capture different aspects of learner information. For this reason, appropriately blending recommendations from both models helps us improve the quality of our personalized recommendations.

Collaborative Filtering
Collaborative Filtering (CF) can provide very powerful recommendations, even when a learner's LinkedIn profile is sparse, by relying on the member’s course-watching behavior for signals. As a result, CF works best with “core” learners—members who are already active on the LinkedIn Learning platform. However, it has relatively poor performance when recommending courses for new learners (popularly known as the “cold start” problem).

Several other advantages of CF include:

the ability to capture recent interests by focusing on recent interactions.
diversified recommendations, since they are based on similarity in course watch behavior, rather than the content of the courses.
relying solely on engagement data, mitigating the need for domain knowledge.

Response Prediction
Response Prediction models are less affected by the interaction sparsity problem because they also rely on signals extracted from learners’ LinkedIn profiles and activity. Hence, this algorithm typically performs better than CF for members with no/little previous engagement on LinkedIn Learning, as well as for new courses with few prior interactions.

Stages of model building
Finally, we take a multi-stage approach for learning both of these models and generating their recommendations, which is shown succinctly below:

diagram-showing-offline-model-building-stages

Different stages in offline model building

Our growth and what’s next

Our current architecture serves over 16,000 courses to 23M paid subscribers, approximately 12,000 enterprise customers (companies in over 100 countries), and more than half of the top 100 global universities. Learning AI also provides course recommendations for free content to our 690M members. New content is continuously added to the platform (~60+ courses per week) and new paid subscriptions have more than doubled from 2017. Consequently, our recommendation engine has been designed in a scalable and robust manner to support the traffic.

The two main components of the recommendation engine—the Response Prediction and Collaborative Filtering models—are the key for generating the most relevant and personalized recommendations for every learner. In part 2 of this series, we’ll look at the Response Prediction and Collaborative Filtering aspects of the recommendation engine in detail, as well as the next steps. Stay tuned!

Topics: Analytics Recommendations Artificial intelligence Data Product Design Data Management Machine Learning