Join LinkedIn Engineering @ KDD 2019
LinkedIn operates the world’s largest professional network with more than 645 million members in over 200 countries and territories. Our unique datasets give our AI experts and data scientists the ability to conduct applied research that fuel LinkedIn’s data driven products (People You May Know, Jobs You May Be Interested In, Feed).
LinkedIn’s team of AI engineers and scientists work with massive datasets, solve real problems for our members around the world, and publish at major conferences. They actively contribute to the open source community and are pursuing research in areas such as: computational advertising, machine learning, scalable AI infrastructure, recommender systems, and more.
Talk: Evolving Data Science at LinkedIn
Ya Xu (LinkedIn)
Data science as a study to extract value from data is constantly evolving. How we apply it at LinkedIn evolves as well.
There are three aspects where we have seen significant shifts. First, the need to broaden the view from a single entity to a community and network. Second, the need to scale through both use cases and compute efficiency. Third, the need to ensure we use data in the ethical way (privacy, fairness, etc.).
In this talk, I will discuss the on-going efforts at LinkedIn in adapting and evolving to these three areas.
Talk: Artificial Intelligence and Product Ecosystem Optimization
Romer Rosales (LinkedIn)
Artificial Intelligence (AI) is behind practically every product experience at LinkedIn. From ranking our member’s feed to recommending new, relevant jobs, AI is used to fulfill our mission to connect the world’s professionals to make them more productive and successful. While product functionality can be decomposed into separate components, they are deeply interconnected; thus, creating interesting questions and challenging AI problems must be solved in a sound and practical manner. In this talk, I will provide an overview of lessons learned and approaches we have developed to address these problems, including scaling to large problem sizes, handling multiple conflicting objective functions, efficient model tuning, and our progress toward using AI to optimize the LinkedIn product ecosystem more holistically.
At LinkedIn our mission is to build active communities for all of our members such that members are able to disseminate or seek professional content at the right time on the right channel. We mine a variety of data sources including LinkedIn's Economic Graph and member activities on the site and use large scale machine learning algorithms to recommend members to connect to people they might know to build active communities. We build real-time recommendations to disseminate information so that members never miss a relevant conversation that is going on in any of the communities they are part of. Through this talk we will showcase how we are trying to solve some of the most challenging problems on internet-scale social network analysis, streaming algorithms, and multi-objective optimization.
Workshop: Issues of Sentiment Discovery and Opinion Mining (WISDOM’19)
Yongzheng Zhang (LinkedIn), Bing Liu (University of Illinois at Chicago), Erik Cambria (Nanyang Technological University), Xiaodan Zhu (Queen's University)
Started in 2012, the KDD WISDOM workshop series aims to provide an international forum for researchers to share their latest investigations in opinion mining and sentiment analysis. The broader context of the workshop includes opinion mining, information retrieval, and natural language processing, to name a few. The WISDOM'19 program consists of keynote speeches from world-renowned researchers along with peer-reviewed papers that showcase the latest developments across various areas, such as deep learning for sentiment analysis. We hope to see you at the KDD’19 WISDOM workshop in Anchorage on August 4th.
Workshop: 15th International Workshop on Mining and Learning with Graphs (MLG ‘19)
Albert Chen (LinkedIn), Lichao Sun (UIC), Wei Chen (Microsoft)
This talk demonstrates the practical application of theoretical influence spread models at a large-scale online social network. At LinkedIn, we are interested in understanding how content propagates in the feed, and identifying the key actors in this process. We define a member-member influence propagation graph based on historical feed interactions, and then use a model to identify members with high expected influence spread. These influence scores are sometimes surprising and provide a better measure of influence than intuitive baselines. Finally, we will present on the ongoing product applications that use influence score to improve the user experience.
Workshop: Keynote at Workshop on Offline and Online Evaluation of Interactive Systems (KDD’19)
Ya Xu (LinkedIn)
This talk will cover how LinkedIn evaluates changes to online product or engineering systems using controlled experiments or quasi experimental techniques, especially some of the challenges we face, including network interference, experiment sensitivity, etc.
Tutorial: Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned
Sarah Bird (Microsoft), Ben Hutchinson (Google), Krishnaram Kenthapadi (LinkedIn), Emre Kiciman (Microsoft), Margaret Mitchell (Google)
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial aims to present an overview on algorithmic bias/discrimination and the techniques used to achieve fairness in machine learning systems. We will cover the motivations in adopting a "fairness-first" approach when developing machine learning-based models and systems in practice. Based on our experiences in the industry, we will present case studies from different technology companies, highlight best practices, and identify open problems and research challenges for the data mining/machine learning community.
Tutorial: Explainable AI in Industry
Krishna Gade (Fiddler Labs), Sahin Cem Geyik (LinkedIn), Krishnaram Kenthapadi (LinkedIn), Varun Mithal (LinkedIn), Ankur Taly (Fiddler Labs)
Model explainability is a prerequisite for building trust and adoption of AI systems in high-stakes domains. In this tutorial, we will present an overview of model interpretability and explainability in AI, key regulations/laws, and techniques/tools for providing explainability as part of AI/ML systems. Then, we will focus on the application of explainability techniques in industry, wherein we present the practical challenges/guidelines for using explainability techniques effectively, and the lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning application domains such as search and recommendation systems, sales, lending, and fraud detection. Finally, based on our experiences in the industry, we will identify open problems and research directions for the data mining/machine learning community.
Tutorial: Deep Natural Language Processing for Search and Recommender Systems
Weiwei Guo (LinkedIn), Huiji Gao (LinkedIn), Jun Shi (LinkedIn), Bo Long (LinkedIn), Liang Zhang (LinkedIn), Bee-Chung Chen (LinkedIn), Deepak Agarwal (LinkedIn)
Search and recommender systems process rich natural language text data such as user queries and documents. Achieving high-quality search and recommendation results requires processing and understanding such information effectively and efficiently for natural language processing (NLP) technologies to be widely deployed. In recent years, the rapid development of deep learning models has been proven successful for improving various NLP tasks, indicating their great potential in promoting search and recommender systems.In this tutorial, we summarize the current efforts of deep learning for NLP in search/recommender systems. We first will provide an overview on search/recommender systems with NLP, then introduce the basic concept of deep learning for NLP, covering state-of-the-art technologies in both language understanding and language generation. After that, we will share our hands-on experience with LinkedIn applications. In the end, we will highlight several important future trends.
Accepted Paper: Internal Promotion Optimization
Rupesh Gupta (LinkedIn), Guangde Chen (LinkedIn), Shipeng Yu (LinkedIn)
Most large Internet companies run internal promotions to cross-promote their different products and/or to educate members on how to obtain additional value from the products that they already use. However, since these internal promotions can distract a member away from the product or page where these are shown, there is a non-zero cannibalization loss incurred for showing these internal promotions. In addition, excessive internal promotions can also result in the degradation of member experience. In this paper, we present a cost-benefit analysis of showing internal promotions, our formulation for optimizing internal promotions, the architecture of the system serving internal promotions, and experimental results from online A/B tests.
Accepted Paper: Feedback Shaping: A Modeling Approach to Nurture Content Creation
Ye (Iris) Tu (LinkedIn), Chun Lo (LinkedIn), Yiping Yuan (LinkedIn), Shaunak Chatterjee (LinkedIn)
Social media platforms bring together content creators and content consumers through recommender systems, like a newsfeed. The focus of such recommender systems has thus far been primarily on modeling the preferences of the content consumers and optimizing for their experiences. However, it is equally critical to nurture the creation of content by prioritizing the creators' interests. Quality content forms the seed for sustainable engagement and conversations, and will bring in new consumers while retaining existing ones. In this work, we propose a modeling approach to predict how feedback from content consumers incentivizes creators. We then leverage this model to optimize the newsfeed experience for content creators by reshaping the feedback distribution, leading to a more active content ecosystem. We will also discuss how to practically balance the user experience for both consumers and creators, and how we carry out online A/B tests with strong network effects. We will present a deployed use case on the LinkedIn newsfeed, where we used this approach to improve content creation significantly without compromising the consumers' experience.
Accepted Paper: Social Skill Validation at LinkedIn
Xiao Yan (LinkedIn), Jaewon Yang (LinkedIn), Mikhail Obukhov (LinkedIn), Lin Zhu (LinkedIn), Joey Bai (LinkedIn), Shiqi Wu (LinkedIn), Qi He (LinkedIn)
The main mission of LinkedIn is to connect over 630 million members to the right opportunities. To find the right opportunities, LinkedIn needs to understand each member’s skill set and expertise levels accurately. However, estimating this can be challenging due to the lack of ground-truth. So far, the industry has relied on either hand-created small scale data, or large scale social gestures containing a lot of social bias (e.g., endorsements). In this paper, we develop the Social Skill Validation, a novel framework of collecting validations for members’ skill expertise at the scale of billions of member-skill pairs. Unlike social gestures, we collect signals in an anonymous way to ensure objectiveness. We also developed a machine learning model to make smart suggestions in collecting validations more efficiently. With the social skill validation data, we discover the insights on how people evaluate other people in professional social networks. For example, we find that the members with higher seniority do not necessarily get positive evaluations compared to more junior members. We evaluate the value of social skill validation data on predicting who is hired for a job requiring a certain skill, and model using social skill validation; this outperforms the state-of-the art methods on skill expertise estimation by 10%. Our experiments show that the Social Skill Validation we built provides a novel way to estimate the members’ skill expertise accurately at scale, while offering a benchmark to validate social theories on peer evaluation.
Accepted Paper: Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search
Sahin Cem Geyik (LinkedIn), Stuart Ambler (LinkedIn), Krishnaram Kenthapadi (LinkedIn)
We present a framework for quantifying and mitigating algorithmic bias in mechanisms designed for ranking individuals, typically used as part of web-scale search and recommendation systems. We first propose complementary measures to quantify bias with respect to protected attributes, such as gender and age. We then present algorithms for computing fairness-aware re-ranking of results. For a given search or recommendation task, our algorithms seek to achieve a desired distribution of top ranked results with respect to one or more protected attributes. We show that such a framework can be tailored to achieve fairness criteria, such as equality of opportunity and demographic parity, depending on the choice of the desired distribution. We evaluate the proposed algorithms via extensive simulations over different parameter choices, and study the effect of fairness-aware ranking on both bias and utility measures. We finally present the online A/B testing results from applying our framework towards representative ranking in LinkedIn Talent Search, and discuss the lessons learned in practice. Our approach resulted in tremendous improvement in the fairness metrics (a nearly threefold increase in the number of search queries with representative results) without affecting the business metrics. This paved the way for deployment to 100% of LinkedIn Recruiter users worldwide. Ours is the first large-scale deployed framework for ensuring fairness in the hiring domain with the potential positive impact for the more than 630 million members of LinkedIn.
Accepted Paper: Top Challenges from the first Practical Online Controlled Experiments Summit
Somit Gupta (Microsoft), Ronny Kohavi (Microsoft), Diane Tang (Google), Ya Xu (LinkedIn), Reid Andersen (Airbnb), Eytan Bakshy (Facebook), Niall Cardin (Google), Sumitha Chandran (Lyft), Nanyu Chen (LinkedIn), Dominic Coey (Facebook), Mike Curtis (Google), Alex Deng (Microsoft), Weitao Duan (LinkedIn), Peter Forbes (Netflix), Brian Frasca (Microsoft), Tommy Guy (Microsoft), Guido W. Imbens (Stanford), Guillaume Saint Jacques (LinkedIn), Pranav Kantawala (Google), Ilya Katsev (Yandex), Moshe Katzwer (Uber), Mikael Konutgan (Facebook), Elena Kunakova (Yandex), Minyong Lee (Airbnb), MJ Lee (Lyft), Joseph Liu (Twitter), James McQueen (Amazon), Amir Najmi (Google), Brent Smith (Amazon), Vivek Trehan (Uber), Lukas Vermeer (Booking.com), Toby Walker (Microsoft), Jeffrey Wong (Netflix), Igor Yashkov (Yandex)
Online controlled experiments (OCEs) have become ubiquitous in evaluating the impact of changes made to software products and services. There are many practical challenges in running OCEs at scale that encourage further academic and industrial exploration. To understand these challenges, 34 representatives with experience in large-scale experimentation from 13 different organizations (Airbnb, Amazon, Booking.com, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, Yandex, and Stanford University) were invited to the first Practical Online Controlled Experiments Summit in Sunnyvale, CA, USA on December 13-14, 2018. Together these organizations tested more than one hundred thousand experiment treatments last year. This is the first paper to discuss the top challenges faced across the industry for running OCEs at scale, as well as the common solutions.
Workshop Poster: A Time-Aware Inductive Representation Learning Strategy for Heterogeneous Graphs
Bo Yan (LinkedIn), Matthew Walker (LinkedIn), Krzysztof Janowicz (UCSB)
Graphs are versatile data structures that have permeated a large number of application fields, such as biochemistry, knowledge graphs, and social networks. As a result, different graph representation learning models have been proposed as effective approaches to represent graph components in downstream machine learning tasks, such as node classification and recommendation. However, most representation learning models in graphs do not natively work on heterogeneous graphs and consequently, are unable to learn embeddings for different relations in the graph. In this paper, we extend and improve existing models by enabling an edge-based transformation procedure in order to learn embeddings for different relations in heterogeneous graphs. In addition, we show that by incorporating a sequential model to learn more expressive representations, we can capture temporal dynamics in social networks. Finally, we examine our model within the context of two very disparate heterogeneous graphs, a knowledge graph dataset and a professional social network dataset, to illustrate our point and show the effectiveness of our approach. By learning edge-based transformations, our model yields a Mean Reciprocal Rank score that is more than four times higher than the homogeneous counterpart for the knowledge graph dataset. By incorporating the temporal dynamics, our model improves the HITS@1 score by more than 15% compared with the baseline model for the professional social network dataset.
KDD Social Impact Workshop: Building Features Which Benefit Every Member: Measuring Inequality in the Individual Treatment Effects in Online Experiments
Guillaume Saint-Jacques (LinkedIn), Amir Sepehri (LinkedIn)
Online controlled experiments have become a core decision-making tool in technology companies. However, most decisions made with experiments are based on average effects, i.e., focusing on whether a feature benefits the average user. This may be detrimental to the objectives of social responsibility and equality in opportunity, as some members may be "left behind." A new product change might benefit a subpopulation, but hurt another. We use the Atkinson index, from the economics literature, in our A/B tests to measure whether the treatment is increasing inequality in engagement and contribution among members. We use this to identify not only problematic new product features, but also those that help close the gap among our members. We provide real examples from LinkedIn, as well as a highly scalable implementation of the computation of the Atkinson index and its A/B testing variance in Spark and Scala. This allows us to compute the Atkinson index and its variance for a sample of tens of millions in just five to 10 minutes.
Project Showcase: AnaBot - Lessons from Building a Serial Chatbot in Collaboration with Analysts and Linguists
Hongche Liu (LinkedIn), Jaewon Yang (LinkedIn), Qi He (LinkedIn)
AnaBot (Analytics chatBot) is the umbrella project for all the chatbots built for our users to access insights from our knowledge base in the cloud. We’d like to share the lessons learned through several end-to-end launches of the chatbots in collaboration with data analysts and linguists to hone the process of data collection, modeling, dialog management and designs. We believe there are universal values in sharing these for any researchers and developers in chatbots and Q&A systems.
Panel Discussion: KDD Women's Lunch
KDD will be hosting a panel discussion during the Women’s Lunch. The panelists are Rukmini Iyer (Distinguished Engineer, Microsoft), Tina Eliassi-Rad (Associate Professor, Northeastern University), Lillian Carrasquillo (Senior Data Scientist, Spotify), Romer Rosales-Demoral (Senior Director, LinkedIn). The group will discuss a range of topics ranging from an understanding of why environments fostering diversity, inclusion and belonging are important in Academia and Industry to what it takes to have a successful career in the field of Data Mining and Artificial Intelligence. Hema Raghavan from LinkedIn will be moderating the panel.