Distributed Systems Articles

  • helixupdate2

    Powering Helix’s Auto Rebalancer with Topology-Aware Partition Placement

    July 26, 2017

    Typical distributed data systems are clusters composed of a set of machines. If the dataset does not fit on a single machine, we usually shard the data into partitions, and each partition can have multiple replicas for fault tolerance. Partition management needs to ensure that replicas are distributed among machines as evenly as possible. More crucially, when a...

  • production_software2

    Building Venice: A Production Software Case Study

    April 4, 2017

    We build a lot of our own infrastructure systems here at LinkedIn. Many people have heard of Kafka, our distributed message buffer. We also run various databases, blob stores, and stream and image processing systems, all of which we develop, deploy, and maintain in-house. One of the systems we've been working on recently is Venice, a distributed derived...

  • Venice3

    Building Venice with Apache Helix

    February 15, 2017

    Background Like many internet companies, LinkedIn has faced data growth challenges. Naturally, distributed storage systems became the solution to handle larger volumes of data and queries per second (QPS). But, aside from scaling issues, the variability in access patterns also grew quickly. For example, some scenarios require no more than simple put/get...

  • Announcing Gobblin 0.7.0: Going Beyond Ingestion

    June 29, 2016

    About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced...

  • Bridging Batch and Streaming Data Ingestion with Gobblin

    September 28, 2015

    Genesis Less than a year ago, we introduced Gobblin, a unified ingestion framework, to the world of Big Data. Since then, we’ve shared...

  • Prototyping Venice: Derived Data Platform

    August 10, 2015

    This is an interview with Clement Fung, who interned with the Voldemort team last year and liked LinkedIn so much that he decided to...