Distributed Systems Articles

  • Espresso-online-data-flow-with-Netty4

    Improving Performance and Capacity for Espresso with New Netty Framework

    June 27, 2019

    In this blog post, we’ll share how we migrated Espresso, LinkedIn’s distributed data store, to a new Netty4-based framework and achieved a large performance and capacity gain throughout the Espresso system as a result. In the larger scheme, this is particularly important since Espresso is a master data hub that serves many important applications across LinkedIn,...

  • star-tree-data-structure

    Star-Tree Index: Powering Fast Aggregations on Pinot

    June 14, 2019

    Pinot is an open source, scalable distributed OLAP data store that entered the Apache Incubation recently. Developed at LinkedIn, it works across a wide variety of production use cases to deliver real-time, low latency analytics. One of the biggest challenges in Pinot is achieving and maintaining tight SLA on latency and throughput on large data sets. Existing...

  • helixtask1

    Managing Distributed Tasks with Helix Task Framework

    January 24, 2019

    Co-authors: Junkai Xue and Hunter Lee   Stateless tasks are widely used for serving large-scale data processing systems. Lots of requests were made by systems, which rely on Apache Helix, for a stateless task management feature to be added to Apache Helix. Recently, our team decided to explore new ways to manage stateless tasks, in addition to our ongoing work...

  • Incremental-Data-Capture-2

    Incremental Data Capture for Oracle Databases at LinkedIn: ...

    November 22, 2017

    Co-authors: Saurabh Goyal and Janardh Bantupalli In our previous blog post introducing Brooklin, we outlined the reasons why we...

  • commentrelevance1

    Serving Top Comments in Professional Social Networks

    September 20, 2017

    Co-authors: Divye Kapoor, Zheng Li, and Pujita Mathur Introduction As a professional social network serving more than 500 million...

  • helixupdate2

    Powering Helix’s Auto Rebalancer with Topology-Aware...

    July 26, 2017

    Typical distributed data systems are clusters composed of a set of machines. If the dataset does not fit on a single machine, we...