The journey to build an explainable AI-driven recommendation system to help scale sales efficiency across LinkedIn

Jilei Yang

Staff Software Engineer, Machine Learning at LinkedIn | PhD in Statistics

April 6, 2022

Co-authors: Jilei Yang, Parvez Ahammmad, Fangfang Tan, Rodrigo Aramayo, Suvendu Jena, Jessica Li

At LinkedIn, we have the opportunity to work with many different types of customers with varying business needs. From multinational corporations to small businesses, technology plays such a critical role in how we enable our sales teams to support our customers. These teams need to be well-versed in their customers’ unique needs and the broader macro-environment in which they operate. During the pandemic, this became particularly important as many of our customers’ business priorities and operating dynamics shifted. Our sales teams needed to invest their time to have a deep understanding of their customers’ goals and be able to create highly personalized product and solutions recommendations, which can be a time-consuming process. These solutions also needed to be reconsidered across many customer accounts. Being able to scale this type of intelligent, value-driven customer outreach presented both a significant business challenge and opportunity.

So, the question became: how can we help our sales team effectively identify the best LinkedIn solutions or products to fit customers’ needs in a scalable and accurate manner?

To meet this challenge, our data teams leveraged machine learning (ML) models to better segment, prioritize, and help target accounts for our sales representatives. We explored which accounts were growing quickly and may need a new product package and which ones were struggling to get full value from their tools and may need more information. We called this work “Project Account Prioritizer,” and it provided a score for each existing customer that was eligible for renewal, and key field products they might be interested in to meet their business needs.

While this ML-based approach was very useful, we found from focus group studies that ML-based model scores alone weren’t the most helpful tool for our sales representatives. Rather, they wanted to understand the underlying reasons behind the scores—such as why the model score was higher for Customer A but lower for Customer B—and they also wanted to be able to double check the reasoning with their domain knowledge.

In this blog post, we showcase how we built an ML-based solution to serve useful recommendations about potential account churn and upsell opportunities to our LinkedIn sales representatives. We further showcase how we expanded this tool to leverage the state-of-the-art, user-facing explainable AI system CrystalCandle (previously named Intellige) to create narrative-driven insights for each account-level recommendation. CrystalCandle plays an important role in Project Account Prioritizer by helping our sales team understand and trust our modeling results because they understand the key facts that influenced the model’s score.

The combination of Project Account Prioritizer and CrystalCandle has deepened our customer value by increasing the information and speed with which our sales teams can reach out to customers having poor experience with the products, or offer additional support to those growing quickly.

Project Account Prioritizer: Predicting upsell and churn for our SaaS products

Before Project Account Prioritizer, sales representatives relied on a combination of human intelligence and spending huge amounts of time sifting through offline data to identify which accounts were likely to continue doing business with us and what products they might be interested in during the next contract renewal. Similarly, identifying accounts that were likely to churn and proactively addressing them constituted a huge time draw for the sales teams as well.

With Project Account Prioritizer, we were able to greatly improve efficiency for these teams. We provided accurate predictions to differentiate the accounts that were likely to upsell/churn, and predict to what degree, and quantify the number of products they were likely to buy/churn in the upcoming renewal. We did this by training a set of XGBoost regressors on historical purchases and renewals of our customers across the globe. Some of the patterns that the models look for include historical bookings (e.g., field vs. online spend), trends in product engagement and usage (such as utilization of our Recruiter or Jobs products), hiring patterns (such as growth in employees, hiring seniority levels, and talent flows), company firmographics (such as industry and its performance during COVID-19), macro trends, and most importantly, delivered customer value (e.g., hires delivered by LinkedIn).

Two challenges that make this modeling exercise complex are :

Generating accurate labels for churn/upsell: As with other SaaS products, while churn happens during renewal, upsell can happen throughout the year—e.g., an add on opportunity mid-cycle. This leads to a waterfall trend of product bookings/quantity throughout the renewal cycle, so defining a label for churn/upsell becomes challenging. We solved this by designing overlapping time periods of label generation throughout the renewal cycle, instead of one label assigned to a particular renewal. While this increases feature complexity, it gives us more accurate labeling and a higher sample size across customers. Figure 1 shows that for time period 1 & 2 we would map an upsell label, for the same customer time period 3 would be mapped to a churn label.
Generating predictions in advance of actual renewal: For our sales teams to act on our predictions (mitigate churn or justify upsell), we need to generate them in advance of actual renewal, so the most recent product engagement and usage data would not be available for predictions. We solved this by creating various time series features across historical data (e.g., Inmail Response Rate or Number of Job Applications in the last three months, last six months, last nine months, etc.) to capture the evolving trends. We also scored the accounts monthly to provide the most recent predictions to sales teams. In particular, features from the LinkedIn Economic Graph such as talent flows and industry macro trends have been very helpful.

image-of-label-generation-across-overlapping-time-periods

Figure 1. Label generation across overlapping time periods within the same renewal cycle. Blue square dots represent the beginning of a contract (new business, add-on, renewal), while gray square dots represent the ending of a contract. Green up arrow means an upsell label, while red down arrow means a churn label.

Currently, we’ve trained separate models to provide predictions at account level, at various product levels, and identify likelihood to buy all-in deals helping us capture the differentiation in labels and nuances in usage. Performance of these models has reached a range between 0.73-0.81 on metrics Precision and Recall. Qualitative feedback gathered from our sales teams also showed that the models match closely with their field knowledge and intuition (~80% - 85% accuracy across individual sales books from field surveys). All of these scores are shown to sales teams by integrating directly into the CRM system (e.g., Microsoft Dynamics), enabling them to decide the best course of action. In the near future, we are exploring multi-task learning to combine these separate models into a single consolidated framework to provide more unified recommendations and further simplified end user experience.

CrystalCandle overview

A key thing we learned from a focus group study with sales representatives is that the scores alone may not be the most helpful. For sales representatives to take action, they need to know the underlying reasons behind these scores, and they also want to double check these reasons with their domain knowledge. Even though some state-of-the-art model interpretation approaches (e.g., LIME, SHAP) can help create an important feature list to help users to interpret the ML-model provided scores, the feature names in these lists are often not very intuitive to a non-technical audience. The features also may not be well-organized (e.g., relevant features could be further grouped, redundant features could be removed).

To deal with the above challenges, we have built and implemented a user-facing model explainer called CrystalCandle, which is a key part of developing transparent and explainable AI systems at LinkedIn. The output of CrystalCandle is a list of top narrative insights for each customer account (shown in Figure 2), which reflects the rationale behind the ML-model provided scores. These narrative insights are much more user-friendly, bring important metrics to sales representatives’ attention, and are clear and concise. These narratives give more support for sales teams to trust the prediction results and better extract meaningful insights.

Figure 2. Mocked top narrative insights generated by CrystalCandle for a specific customer in account level upsell prediction.

Figure 3 shows the pipeline of CrystalCandle. As you can see, CrystalCandle serves as a bridge between the machine learning models, such as the upsell propensity model in the Project Account Prioritizer, and the end users (i.e., the sales representatives). The Model Importer consumes the model output from a set of major machine learning platforms (e.g., ProML), and converts it into a standardized machine learning model output. Then in Model Interpreter, we implement model interpretation approaches onto the standardized machine learning model output and generate the important feature list for each sample. We also feed some additional inputs into CrystalCandle at this stage, including the additional feature information and narrative templates. We then conduct narrative template imputation in Narrative Generator and produce top narrative insights for each sample. Finally, we surface these narrative insights onto a variety of end-user platforms via Narrative Exporter. The entire CrystalCandle product is built on Apache Spark to achieve high computational power and high compatibility with upstream platforms such as ProML and downstream platforms such as MyBook in Microsoft Dynamics. Next, we dive deeper into the major components of CrystalCandle.

Figure 3. CrystalCandle pipeline.

Model Interpreter deep dive
The goal of Model Interpreter is to produce sample-level feature importance scores based on machine learning model output. This component is compatible with state-of-the-art model interpretation approaches such as SHAP, LIME, and K-LIME. When implementing these approaches, some options, such as SHAP and LIME, need access to the original predictive model when calculating feature importance scores.This access may not be needed for other approaches such as K-LIME, where feature values and model prediction scores are enough. As there exists plenty of literature on the algorithmic details of these approaches, we will not go deep into them in this blog post. We encourage readers to refer to this book for a comprehensive overview of model interpretation approaches.

Narrative Generator and Insights Design deep dive
The goal of Narrative Generator is to produce the top narrative insights based on model output and model interpretation results. Some insights that were helpful in designing the Narrative Generator include:

We need to incorporate feature descriptions into narratives to make feature names readable. This requires input from domain experts such as the model builders.
We do not want to overwhelm end users by producing one narrative for each feature, since models can have hundreds of features. Therefore, feature clustering based on semantic meaning is important.
There may exist a set of narratives that are consistent (e.g., XX changes from A to B), and so constructing reusable narrative templates can be helpful.
We need to select top narratives in a scalable way, where we can leverage the feature importance scores from the Model Interpreter.

image-of-feature-clustering-and-template-imputation-in-narrative-generator

Figure 4. Feature clustering and template imputation in Narrative Generator

With these insights in mind, we first look at how we build the feature clustering information file and narrative templates in Figure 4. We use the Job Slots Upsell Model within Project Account Prioritizer as an example. We build the four-layer feature hierarchy as the feature clustering information file. For each original feature, we figure out its higher level features, moving from super feature, to ultra feature, and finally to the category. This file is constructed with the help of model builders. We see that the feature descriptions have been naturally incorporated in the super feature names. The functionalities of these feature hierarchies will be discussed shortly.

We also construct a list of narrative templates, where each template is uniquely identified by its insight type. The rule to conduct narrative template imputation is provided by the last table. In that table, each super feature corresponds to one insight type, thus one narrative will be constructed for each super feature. The insight item determines the position to impute the feature values into the narrative templates. For example, to construct the narrative for super feature “viewers per job,” we find out its narrative template “value change,” replace the blanks “prev_value,” “current_value,” and “super_feature” with the feature values and super feature name “viewers per job,” and calculate “percent_change.” We will then have its imputed narrative, e.g., “Viewers per job changed from 200 to 300 (+50%) in the last month.”

image-of-narrative-ranking-and-deduplication-in-narrative-generator

Figure 5. Narrative ranking and deduplication in Narrative Generator

We next show how we select top narratives in a scalable way in Figure 5. We first append the feature importance scores from Model Interpreter into the feature information table presented in Figure 3. During the narrative imputation process, we also calculate the narrative importance score as the largest feature importance score of all the original features under the super-feature corresponding to the narrative. We then rank all the narratives according to the narrative importance score. In the meantime, we also conduct narrative deduplication by keeping only the narrative with the largest narrative importance score within each ultra feature. This is in consideration of the fact that narratives under one ultra feature can be highly overlapped. Finally, we conduct narrative concatenation by concatenating narratives within each category; the concatenated top narratives are the final output from the Narrative Generator.

CrystalCandle implementation within Project Account Prioritizer

Figure 6. Mocked CrystalCandle output (Customer Propensity Score Insights Illustrative Example) on MyBook

LinkedIn sales teams use multiple internal sales intelligence platforms. One typical platform, MyBook (embedded in Microsoft Dynamics), aims to help sales representatives close deals faster by providing well-organized sales insights and recommendations. For several quarters so far, CrystalCandle has assisted LinkedIn data scientists in converting machine intelligence from business predictive models into sales recommendations on these sales intelligence platforms. Figure 6 shows one typical output of CrystalCandle on MyBook in Project Account Prioritizer. When a sales representative logs into MyBook, a list of accounts are displayed on the MyBook homepage. The column “Existing Customer Propensity Score (LTS) Justification,” shows the upsell propensity score for each account from the predictive models in Project Account Prioritizer. To learn more about the underlying reasons behind each score, sales representatives can hover over the “i” button and a small window with more account details will pop up. In this pop-up window, CrystalCandle provides top narrative insights for each account.

After launching Project Account Prioritizer integrated with CrystalCandle explanations, we deployed a rigorous A/B test to quantify the performance of these models. The models resulted in +8% lift in Renewal Incremental Growth (a measure of bookings growth during contract renewal) for the business. Feedback from the sales team has also been highly positive: “The predictive models [are] a game changer! The time saved on researching accounts for growth opportunities has been cut down with the data provided in the report which has allowed me to focus on other areas across MyBook.”

Besides MyBook integration, CrystalCandle-based sales recommendations have also been surfaced onto other sales intelligence platforms for different audiences and use cases with the help of Narrative Exporter. By the end of 2021, eight CrystalCandle-based sales recommendations across three lines of LinkedIn business—Talent Solutions (LTS), Marketing Solutions (LMS), and Sales Solutions (LSS) —have been onboarded onto four internal sales intelligence platforms.

Conclusion

Scaling intelligent and value-driven customer outreach for the LinkedIn sales team is a crucial business challenge. The LinkedIn Data teams developed state-of-the-art machine learning models in Project Account Prioritizer to provide the sales team with account-level information on churn risk and upsell propensity. They further leveraged the user-facing model explainer CrystalCandle to create top narrative insights for each account-level recommendation. CrystalCandle helps sales teams trust modeling results and extract meaningful insights from them. A/B testing results demonstrate that the launch of Project Account Prioritizer with CrystalCandle interpretation has led to significant revenue improvements.

We’ll end this blog post by highlighting a couple of future improvements we hope to bring to CrystalCandle:

Today the Insights Design input, including the feature information file and narrative templates, is mostly manually created. We plan to investigate ways to auto-generate parts of Insights Design to further reduce manual efforts from CrystalCandle users.
Translation of narrative templates into code is manually conducted right now. In the future, we will try to automate this process by identifying symbols and characters in narrative templates and converting them into the appropriate code automatically.

Acknowledgments

We would like to acknowledge Sofus Macskássy, Rahul Todkar, Romer Rosales, and Ya Xu for their leadership in this cross-team work and Dan Shapero, Mark Lobosco, Lekha Doshi, and Jordan Levy for their business partnership. We would like to thank all of the contributors and users who assisted with CrystalCandle from the Data Science Applied Research team (Diana Negoescu, Saad Eddin Al Orjany, and Rachit Arora), the Go-to-Market Data team (Harry Shah, Wenrong Zeng, Yu Liu, Liyang Zhao, Jiang Zhu, Jimmy Wong, Jiaxing Huang, Yingxi Yu, and Shubham Vinay), the Insights team (Ying Zhou, William Ernster, Eric Anderson, Nisha Rao, Angel Tramontin, and Zean Ng), the Merlin team (Kunal Chopra, Durgam Vahia, and Ishita Shah), the Data Science Productivity Team (Juanyan Li), and our wonderful partners (particularly Tiger Zhang, Wei Di, Sean Huang, and Burcu Baran). We would also like to thank Rupesh Gupta, Jon Adams, Hannah Sills, Kayla Guglielmo, Greg Earl, Darlene Gannon, and Fred Han for their help in reviewing this blogpost.

Topics: Artificial intelligence Machine Learning