Big Data Articles

  • Async1

    Asynchronous Processing and Multithreading in Apache Samza, Part I: Design and Architecture

    January 4, 2017

    As part of the Apache Samza 0.11 release, we rebuilt Samza’s underlying event processing engine to use an asynchronous and parallel processing model. The new model is unique among current open source stream processors because it not only supports running traditional synchronous processing in parallel on multiple threads, but also provides first-class support for...

  • streamprocess1

    Stream Processing Hard Problems Part II: Data Access

    August 22, 2016

    This post is the second in a series of posts that discuss some of the hard problems in stream processing. In the previous post, we explored the use of lambda architecture in stream processing and discussed techniques to avoid it. In this post, we’ll focus on one of the main bottlenecks in high scale stream processing applications: “accessing data.” Background...

  • Announcing Gobblin 0.7.0: Going Beyond Ingestion

    June 29, 2016

    About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced by people working on big data problems. We have described how LinkedIn is using Gobblin to ingest data at massive scale from a variety of sources to HDFS, in many previous blog posts, publications, and talks. Today, we are very...

  • Stream Processing Hard Problems – Part 1: Killing Lambda

    June 27, 2016

    We live in an age where we want to know relevant things happening around the world as soon as they happen; an age where digital...

  • Higher Ed video

    Video: Higher Ed Team Seeks to Predict Collegiate Futures

    November 13, 2015

    LinkedIn leverages professional and educational data from our more than 400 million members to create Student Decision tools that can...

  • Bridging Batch and Streaming Data Ingestion with Gobblin

    September 28, 2015

    Genesis Less than a year ago, we introduced Gobblin, a unified ingestion framework, to the world of Big Data. Since then, we’ve shared...