Distributed Systems Articles

  • commentrelevance1

    Serving Top Comments in Professional Social Networks

    September 20, 2017

    Co-authors: Divye Kapoor, Zheng Li, and Pujita Mathur Introduction As a professional social network serving more than 500 million worldwide members, LinkedIn is the premier destination for professional conversations. We have a wide variety of posts that attract significant engagement, and some of these posts go viral. These posts attract likes and comments in...

  • helixupdate2

    Powering Helix’s Auto Rebalancer with Topology-Aware Partition Placement

    July 26, 2017

    Typical distributed data systems are clusters composed of a set of machines. If the dataset does not fit on a single machine, we usually shard the data into partitions, and each partition can have multiple replicas for fault tolerance. Partition management needs to ensure that replicas are distributed among machines as evenly as possible. More crucially, when a...

  • production_software2

    Building Venice: A Production Software Case Study

    April 4, 2017

    We build a lot of our own infrastructure systems here at LinkedIn. Many people have heard of Kafka, our distributed message buffer. We also run various databases, blob stores, and stream and image processing systems, all of which we develop, deploy, and maintain in-house. One of the systems we've been working on recently is Venice, a distributed derived...

  • Venice3

    Building Venice with Apache Helix

    February 15, 2017

    Background Like many internet companies, LinkedIn has faced data growth challenges. Naturally, distributed storage systems became the...

  • Announcing Gobblin 0.7.0: Going Beyond Ingestion

    June 29, 2016

    About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced...

  • Bridging Batch and Streaming Data Ingestion with Gobblin

    September 28, 2015

    Genesis Less than a year ago, we introduced Gobblin, a unified ingestion framework, to the world of Big Data. Since then, we’ve shared...