Distributed Systems Articles

  • an-illustration-of-the-distributed-tier-merge

    Distributed tier merge: How LinkedIn tackles stragglers in search index build

    September 27, 2021

    Co-authors: Andy Li and Hongbin Wu Indexing plays the key role in modern search engines for fast and accurate information retrieval, and the ability to swiftly build indexes is crucial for LinkedIn to provide up to date information, such as candidates to recruiters, job posts to members, etc. In some instances, such as if a member profile is missing and...

  • chart-showing-exponential-growth-of-data-metadata-and-compute-on-linkedins-largest-hadoop-cluster

    The exabyte club: LinkedIn’s journey of scaling the Hadoop Distributed File System

    May 27, 2021

    Co-authors: Konstantin V. Shvachko, Chen Liang, and Simbarashe Dzinamarira LinkedIn runs its big data analytics on Hadoop. During the last five years, the analytics infrastructure has experienced tremendous growth, almost doubling every year in data size, compute workloads, and in all other dimensions. It recently reached two important milestones. LinkedIn now...

  • Jhubbub-on-Helix-making-stateless-and-elastic-easy

    Jhubbub on Helix: Stateless and elastic made easy

    August 27, 2020

    Co-authors: Hunter Lee and Dru Pollini LinkedIn was built to help professionals achieve more in their careers, and every day millions of people use our products to make connections, discover new opportunities and get better at what they do. An important part of our mission is helping people to find other professionals who are interested in the same things they...

  • event-photo

    LinkedIn NYC Tech Talk series: Engineering Excellence Meetup

    August 28, 2019

    We regularly play host to a series of meetups here at the LinkedIn office in the Empire State Building. Open to the community, these...

  • Espresso-online-data-flow-with-Netty4

    Improving performance and capacity for Espresso with new...

    June 27, 2019

    In this blog post, we’ll share how we migrated Espresso, LinkedIn’s distributed data store, to a new Netty4-based framework and...

  • star-tree-data-structure

    Star-tree index: Powering fast aggregations on Pinot

    June 14, 2019

    Pinot is an open source, scalable distributed OLAP data store that entered the Apache Incubation recently. Developed at LinkedIn, it...