Distributed Systems Articles

  • helixtask1

    Managing Distributed Tasks with Helix Task Framework

    January 24, 2019

    Co-authors: Junkai Xue and Hunter Lee   Stateless tasks are widely used for serving large-scale data processing systems. Lots of requests were made by systems, which rely on Apache Helix, for a stateless task management feature to be added to Apache Helix. Recently, our team decided to explore new ways to manage stateless tasks, in addition to our ongoing work...

  • Incremental-Data-Capture-2

    Incremental Data Capture for Oracle Databases at LinkedIn: Then and Now

    November 22, 2017

    Co-authors: Saurabh Goyal and Janardh Bantupalli In our previous blog post introducing Brooklin, we outlined the reasons why we created our own framework for near real-time incremental data capture from production. This framework feeds data to our larger data ingestion pipeline for the hundreds of nearline applications processing data that are distributed across...

  • commentrelevance1

    Serving Top Comments in Professional Social Networks

    September 20, 2017

    Co-authors: Divye Kapoor, Zheng Li, and Pujita Mathur Introduction As a professional social network serving more than 500 million worldwide members, LinkedIn is the premier destination for professional conversations. We have a wide variety of posts that attract significant engagement, and some of these posts go viral. These posts attract likes and comments in...

  • helixupdate2

    Powering Helix’s Auto Rebalancer with Topology-Aware...

    July 26, 2017

    Typical distributed data systems are clusters composed of a set of machines. If the dataset does not fit on a single machine, we...

  • production_software2

    Building Venice: A Production Software Case Study

    April 4, 2017

    We build a lot of our own infrastructure systems here at LinkedIn. Many people have heard of Kafka, our distributed message buffer. We...

  • Venice3

    Building Venice with Apache Helix

    February 15, 2017

    Background Like many internet companies, LinkedIn has faced data growth challenges. Naturally, distributed storage systems became the...