Big Data Articles

  • behind-big-data-and-ai-1

    Behind "Big Data" and "AI": Elements of Modern Data Science

    April 5, 2018

    I’m sure everyone who has been following tech industry news knows about “big data” and “AI.” Although there is no industry-consistent definition for either term, most people tend to agree that both have been playing more and more important roles lately, and that we need to know and leverage them better in both our personal and professional lives. But wouldn’t it...

  • gobblinlogo1

    Gobblin Enters Apache Incubation

    January 17, 2018

    Gobblin is a distributed data integration framework that simplifies common aspects of big data integration, such as ingestion, replication, organization, and lifecycle management, for both streaming and batch ecosystems. Gobblin has been gobbling big data with ease in the open source world since December 2014. Over the years, Gobblin has evolved at a tremendous...

  • venice1

    Venice Hybrid: Doing Lambda Better

    December 20, 2017

    Over the last two years at LinkedIn, I’ve been working on a distributed key-value database called “Venice.” Venice is designed to be a significant improvement to Voldemort Read-Only for serving derived data. In late 2016, Venice started serving production traffic for batch use cases that were very similar to the existing uses of Voldemort Read-Only. In the time...

  • helixupdate2

    Powering Helix’s Auto Rebalancer with Topology-Aware...

    July 26, 2017

    Typical distributed data systems are clusters composed of a set of machines. If the dataset does not fit on a single machine, we...

  • explodingdata1

    Managing "Exploding" Big Data

    June 15, 2017

    What is the shape of your big data? While we do love to talk about the size of our big data—terabytes, petabytes, and beyond—perhaps...

  • async21

    Asynchronous Processing and Multithreading in Apache Samza,...

    January 6, 2017

    This post is the second in a series discussing asynchronous processing and multithreading in Apache Samza. In the previous post, we...