LinkedIn Papers & Workshops at VLDB 2016

See the full list on the VLDB 2016 website

From Competition to Complementarity: Comparative Influence Diffusion and Maximization

Wei Lu (UBC/LinkedIn), Wei Chem (Microsoft Research), Laks V.S. Lakshmanan (UBC)

Influence maximization is a well-studied data mining problem that asks for a small set of influential users froma social network, such that by targeting them as early adopters, the expected total adoption through influence cascades over the network is maximized. However, almost all prior work focuses on cascades of a single propagating entity or purely-competitive entities. In this work, we proposed the Comparative Independent Cascade (Com-IC) model that covers the full spectrum of entity interactions from competition to complementarity. In Com-IC, users’ adoption decisions depend not only on edge-level information propagation, but also on a node-level automaton governed by a set of model parameters, enabling the model to capture both competition and complementarity to any possible degree. We designed efficient and effective approximation algorithms via non-trivial techniques based on reverse-reachable sets and a novel “sandwich approximation” strategy. The applicability of both techniques extends beyond our model and problems. Empirical studies showed that the proposed algorithms consistently outperform intuitive baselines on four real-world social networks, often by a significant margin.

Workshop: LinkedIn’s Open Source Analytics Platform

Issac Buenrostro, Jean-Francois Im

Modern web scale companies have outgrown traditional analytical products due to challenges in analyzing massive scale, fast moving datasets at real-time latencies. As a result, the traditional integrated analytics database has been unbundled into its component architectural pieces that span data ingestion, data processing, and analytics query serving, with the associated challenges of integrating these systems. In this tutorial, we showcase the open-source LinkedIn Big Data Analytics Stack composed of Kafka, Gobblin, Hadoop and Pinot. We will particularly focus on solving the problems of data ingestion at scale with Gobblin (https://github.com/linkedin/gobblin) and high-performance real-time distributed OLAP servingwith Pinot (https://github.com/linkedin/pinot), and present the simple, easy to maintain integration of the full pipeline.