LinkedIn Tutorials at KDD 2016

See the full list on the KDD 2016 website

Tutorial 10: Business Applications of Predictive Modeling at Scale

Qiang Zhu, Songtao Guo, Paul Ogilvie, Yan Liu

Predictive modeling is the art of building statistical models that forecast probabilities and trends of future events. In this tutorial, we focus on the best practice of predictive modeling in the big data era, with motivating examples across a range of business tasks and relevance products. We present an overview of how predictive modeling helps power and drive key business use cases. We introduce essential concepts and outline the state of the art in building end-to-end predictive modeling solutions. We will also discuss the challenges, key technologies, and lessons learned from our practice, including case studies of LinkedIn feed relevance and a platform for email response prediction. Moreover, we discuss some practical solutions of building predictive modeling platform to scale the modeling efforts for data scientists and analysts, and provide an overview of popular tools and platforms used across the industry.

Hands-On: Streaming Analytics

Ashish Gupta (LinkedIn), Neera Agarwal

Recently we have seen emergence and huge adoption of social media, internet of things for home, industrial internet of things, mobile applications and online transactions. These systems generate streaming data at very large scale. Building technologies and distributed systems that can capture, process and analyze this streaming data in real time is very important for gaining real-time insights. Real-time analysis of streaming data can be used for applications as diverse as fraud detection, in-session targeting and recommendations, control systems for transportation systems and smarter cities, earthquake prediction and control of autonomous vehicles.

This tutorial will provide overview of streaming systems and hands on tutorial on building streaming analytics systems using open source technologies.

Hands-On: Recommender Systems

XianXing Zhang, Deepak Agarwal, Bee-Chung Chen, Paul Ogilvie

Recommendation systems have become ubiquitous for web applications. Given significant heterogeneity in user preference, providing personalized recommendations is key to the success of such systems. To achieve this goal at scale, using machine learned models to estimate user preference from user feedback data is essential. Providing an easy-to-use and flexible machine learning library for practitioners to build personalization models is the key to productivity, agility, and developer happiness. In this tutorial, we first give an overview of the components required for building an end-to-end web recommender system and then focus on how to use Photon ML (LinkedIn's open-sourced machine learning library) to train recommendation models and serve the results to users. Participants will get hands-on experience in training models of different levels of granularity to improve model performance and perform the “modeling loop” consisting of training a model, scoring candidate items using the model, seeing recommended items in a web UI, giving feedback to a number of recommended items, and then training a model again using the newly generated feedback.

More details on Photon ML can be found at

LinkedIn Workshops at KDD 2016

See the full list on the KDD 2016 website

KDD Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM)

Tiger Zhang (LinkedIn Corp), Erik Cambria (Nanyang Tech. U., Singapore), Bing Liu (UIC), Yunqing Xia (Microsoft Research Asia)

The WISDOM series provides an international forum for researchers in the field of machine learning for opinion mining and sentiment analysis to share information on their latest investigations in social information retrieval and their applications both in academic research areas and industrial sectors. WISDOM aims to explore how the wisdom of the crowds is affecting (and will affect) the evolution of the Web and of businesses gravitating around it. In particular, the workshop series explores two different stages of sentiment analysis: the former focusing on the identification of opinionated text over the Web, the latter focusing on the classification of such text either in terms of polarity detection or emotion recognition. The broader context of the workshop comprehends opinion mining, social media marketing, information retrieval, and natural language processing.

For more information about workshop program and past events, please visit

Workshop on Enterprise Relevance

Abhishek Gupta (LinkedIn), George Karypis (University of Minnesota, Twin Cities)

Over $600 billion was spent on Enterprise Software in 2015. With the growing consumerization of Enterprise, Enterprise Software is increasingly in Cloud and is generating vast quantities of enterprise data that we have just started to tap for intelligence. Our goal with this workshop is to bring awareness within research community of the phenomenally large enterprise intelligence market underserved by technology even today. Enterprise data offers unique technology challenges that don’t generally get discussed enough within the traditional consumer context. Uncovering newer insights and learning best practices of successful employees through advanced Data Mining algorithms on enterprise data is a huge untapped opportunity. These insights and best practices can be responsibly leveraged to help the world’s 750 million knowledge working professionals become more productive and successful. KDD techniques are at the core of enterprise intelligence systems in today’s data-driven and connected enterprises. Almost all components of the enterprise intelligence system, such as big data analytics and modeling, ROI attribution, decision support systems, cloud-based data security, privacy and compliance concerns, CRM, ATS, business intelligence, data exploration and visualization, are strongly influenced by the data mining discipline.

For more information about the workshop, please visit

LinkedIn Research Papers at KDD 2016

See the full list on the KDD 2016 website

Evaluating Mobile Apps with A/B and Quasi A/B Tests

Ya Xu, Nanyu Chen

A/B tests on mobile apps are conducted very differently from tests on the web because of the lengthy build, review and adoption process for app release. In addition to discussing how to measure app features individually through randomized A/B tests, we also propose and establish quasi-experimental techniques for evaluating mobile app release, with results shared from a recent major app launch at LinkedIn.

Read the full paper

Audience Expansion for Online Social Network Advertising

Haishan Liu, David Pardoe, Kun Liu

Online social network advertising platforms generally allow marketers to specify targeting options so that their ads appear to a desired demographic. Audience Expansion is a technique developed at LinkedIn to identify new audiences similar to the original target group. With this technique we achieved following objectives: 1) simplified targeting process and increased reach for advertisers and 2) better utilization of ads inventory and more efficient market participation.

Read the full paper

GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction

XianXing Zhang, Bee-Chung Chen, Liang Zhang, Yitong Zhou, Yiming Ma, Deepak Agarwal

GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction

In recommender systems where the data is abundant, having a more fine-grained model at the user or item level to address idiosyncrasies would lead to more accurate prediction, for example, by introducing user/item ID-level regression coefficients in a GLMix setting. However, for big data sets with a large number of ID-level coefficients (e.g., in the order of trillion), fitting GLMix can be computationally challenging. In this talk, we discuss how we successfully overcame the scalability bottleneck and deployed such model at LinkedIn, which generated 20% to 40% more job applications.

Read the full paper

An Empirical Study on Recommendation with Multiple Types of Feedback

Liang Tang, Bo Long, Bee-Chung Chen, Deepak Agarwal

Most recommender systems rely on models trained using a single type of feedback, e.g., ratings for movie recommendation and clicks for online news recommendation. However, in addition to the primary feedback, many systems also allow users to provide other types of feedback, e.g., liking or sharing an article, or hiding all articles from a source. This paper presents an empirical study on various training methods for incorporating multiple user feedback types based on LinkedIn recommendation products. We study three important problems that we face at LinkedIn: 1) Whether to send an email based on clicks and complaints, 2) how to rank updates in LinkedIn feeds based on clicks and hides, and 3) how jointly optimize for viral actions and clicks in LinkedIn feeds. Extensive offline experiments on historical data show the effectiveness of these methods in different situations. Online A/B testing results further demonstrate the impact of these methods on LinkedIn production systems.

Read the full paper

Ranking Universities Based on Career Outcomes of Graduates

Navneet Kapur (GoFundMe), Nikita Lytkin, Bee-Chung Chen, Deepak Agarwal, Igor Perisic (LinkedIn)

Publicly available rankings of academic programs play a key role in prospective students’ decisions regarding which universities to apply to and enroll in. In this paper, we develop a novel methodology for ranking and recommending universities for different professions on the basis of career outcomes of professionals who graduated from those schools. We have applied this methodology on LinkedIn’s Economic Graph data of over 400 million professional from around the world. The resulting university rankings have been made available to the public and demonstrate that there are valuable insights to be gleaned from professional career data on LinkedIn.

Read the full paper

Email Volume Optimization at LinkedIn

Rupesh Gupta, Xiaoyu Chen, Guanfeng Liang, Romer Rosales, Hsiao-Ping Tseng, Ravi Kiran Holur Vijay

Email communication, if used judiciously, can provide an enormous value to members by keeping them engaged. However sending too many email messages may result in reduced effectiveness of communication. In this paper we present a cost-benefit analysis of sending emails, the key factors to administer an effective email volume optimization, our algorithm for volume optimization, the architecture of the supporting system, and experimental results from online A/B tests.

Read the full paper

Dynamics of Large Multi-View Social Networks: Synergy, Cannibalization and Cross-View Interplay

Yu Shi (UIUC), Myunghwan Kim, Shaunak Chatterjee, Mitul Tiwari, Souvik Ghosh, Romer Rosales (LinkedIn)

Most social networking services support multiple types of relationships between users, which can be represented as dynamic multi-view networks. Different network views can have very distinctive properties, while affecting each other as they evolve. We observed synergy, cannibalization and cross-view interplay in a large multi-view social network. With these effects considered, we proposed scalable models with outperforming prediction results on user activity levels.

Read the full paper

How to Get Then a Dream Job?

Jia Li (University of Illinois at Chicago), Dhruv Arya, Viet Ha-Thuc, Shakti Sinha (LinkedIn)

This paper proposes an approach to applying standardized entity data to improve job search quality and to make search results more personalized. Specifically, we explore three types of entity-aware features and incorporate them into the job search ranking function. The first is query-job matching features which extract and standardize entities mentioned in queries and documents, then semantically match them based on these entities. The second type, searcher-job expertise homophily, aims to capture the fact that job searchers tend to be interested in the jobs requiring similar expertise as theirs. To measure the similarity, we use standardized skills in job descriptions and searchers’ profiles as well as skills that we infer searchers might have but not explicitly list in their profiles. Third, we propose a concept of entity-faceted historical click-through-rates (CTRs) to capture job document quality. Faceting jobs by their standardized companies, titles, locations, etc., and computing historical CTRs at the facet level instead of individual job level alleviate sparseness issue in historical action data. This is particularly important in job search where job lifetime is typically short. Both offline and online experiments confirm the effectiveness of the features. In offline experiment, using the entity-aware features gives improvements of +20%, +12.1%and +8.3% on Precision@1, MRR and NDCG@25, respectively. Online A/B test shows that a new model with these features is +11.3% and +5.3% better than the baseline in terms of click-through-rate and apply rate.

Read the full paper

CaSMoS: A Framework for Learning Candidate Selection Models over Structured Queries and Documents

Fedor Borisyuk, Krishnaram Kenthapadi, David Stein, Bo Zhao

Applications such as personalized search and recommendations require real-time scoring of millions of documents for each query, with strict latency constraints. We propose CaSMoS, a machine learned candidate selection framework that makes use of Weighted AND query. Our deployment of this system as part of LinkedIn’s job recommendation engine has resulted in significant reduction in latency (up to 25%) without sacrificing the quality of the retrieved results. 

Read the full paper

Joint Optimization of Multiple Performance Metrics in Online Video Advertising

Sahin Geyik (Linkedin), Sergey Faleev, Jianqiang Shen, Sean O'Donnell, Santanu Kolay (Turn Inc.)

In this paper, we explore the newly popularized space of online video advertising, where brand recognition is the key focus. We propose a framework based on a feedback mechanism where we optimize multiple video specific performance indicators while making sure the delivery constraints (budget and user reach) of advertisers are satisfied. 

Read the full paper

Identifying Decision Makers from Professional Social Networks

Shipeng Yu (LinkedIn), Evangelia Christakopoulou (University of Minnesota), Abhishek Gupta (LinkedIn)

Sales professionals help organizations win clients for products and services. Generating new clients starts with identifying the right decision makers at the target organization. For the past decade, online professional networks have collected tremendous amount of data on people’s identity, their network and behavior data of buyers and sellers building relationships with each other for a variety of use-cases. Sales professionals are increasingly relying on these networks to research, identify and reach out to potential prospects, but it is often hard to find the right people effectively and efficiently. In this paper we present LDMS, the LinkedIn Decision Maker Score, to quantify the ability of making a sales decision for each of the 400M+ LinkedIn members. It is the key data-driven technology underlying Sales Navigator, a proprietary LinkedIn product that is designed for sales professionals. We will specifically discuss the modeling challenges of LDMS, and present two graph-based approaches to tackle this problem by leveraging the professional network data at LinkedIn. Both approaches are able to leverage both the graph information and the contextual information on the vertices, deal with small amount of labels on the graph, and handle heterogeneous graphs among different types of vertices. We will show some offline evaluations of LDMS on historical data, and also discuss its online usage in multiple applications in live production systems as well as future use cases within the LinkedIn ecosystem.

Read the full paper

Towards Data Quality Assessment in Online Advertising

Sahin Geyik (Linkedin), Jianqiang Shen, Santanu Kolay (Turn Inc.), Shahriar Shariat (Uber), Ali Dasdan (Vida Health)

Online advertising aims to match the advertisers with the most relevant users to optimize the campaign performance. Multiple data sources provided by the advertisers or third-party data providers can be utilized for this purpose to choose the set of users according to the advertisers’ targeting criteria. We present a framework that can be applied to assess the quality of such data sources in large scale. We also propose multiple methodologies within this framework and present some preliminary assessment results.

Presented at the KDD Enterprise Intelligence Workshop