Data Analytics Ecosystem Talk at Hadoop Summit

June 9, 2015

When dealing with massive amounts of data at scale, it’s important to have state of the art infrastructure and algorithms that can make sense of all that information. I lead the Data Analytics Infrastructure team at LinkedIn; we work on core infrastructure components such as Hadoop and Spark, as well as develop other infrastructure to support analytics use cases. Our work helps all of LinkedIn meet the challenges involved with ingesting, managing and analyzing large amounts of data and helps us build better, more relevant data products.

At this year’s Hadoop Summit in San Jose, my colleague Shirshanka Das and I are giving a talk on our experience building a real-time, self-service data analytics ecosystem. We will talk about Gobblin (a unified data ingestion platform), Cubert (a highly performant computation framework for Hadoop) and WhereHows (a unified metadata platform). We will also talk about Pinot, a highly innovative distributed OLAP engine that we built at LinkedIn, which has allowed us to build highly interactive and real-time analytics products. If you’ll be at Hadoop Summit , please join us Thursday afternoon to learn more, and come say hi after the talk.

Bigger, Faster, Easier: Building a real-time, self-service data analytics ecosystem at LinkedIn

Time: 2:20 pm
Date: Thursday, June 11
Place: Grand Ballroom 220A at the San Jose Convention Center

Big Data can often overwhelm enterprises due to the technology stack’s relative immaturity and the still unsolved challenges in large scale data ingestion, data management, speed of computation and query serving. At a higher level, business analysts and data scientists continue to face challenges in data discovery, insights exploration and creation as well as operationalizing their dashboards. Learn how LinkedIn is addressing these problems using a layered approach on top of Hadoop-based technologies. We’ll cover specific innovations around large scale data ingestion, OLAP computation and serving, data lineage and business intelligence as well as how we leverage the lambda architecture.