Kafka Articles

  • image-of-the-overview-of-brooklin

    Load-balanced Brooklin Mirror Maker: Replicating large-scale Kafka clusters at LinkedIn

    April 11, 2022

    At LinkedIn, Apache Kafka is used heavily to store all kinds of data, such as member activity, log storage, metrics storage, and a multitude of inter-service messaging. LinkedIn maintains multiple data centers with multiple Kafka clusters per data center, each of which contains an independent set of data. Mirroring (i.e., replicating) Kafka topics across the...

  • FastIngest: Low-latency Gobblin with Apache Iceberg and ORC format

    January 6, 2021

    Co-authors: Zihan Li, Sudarshan Vasudevan, Lei Sun, and Shirshanka Das Data analytics and AI power many business-critical use cases at LinkedIn. We need to ingest data in a timely and reliable way from a variety of sources, including Kafka, Oracle, and Espresso, bringing it into our Hadoop data lake for subsequent processing by AI and data science pipelines. We...

  • from-lambda-to-lambdaless-lessons-learned

    From Lambda to Lambda-less: Lessons learned

    December 1, 2020

    Co-authors: Xiang Zhang and Jingyu Zhu Introduction The Lambda architecture has become a popular architectural style that promises both speed and accuracy in data processing by using a hybrid approach of both batch processing and stream processing methods. But it also has some drawbacks, such as complexity and additional development/operational overheads. One of...

  • mock-screenshot-of-the-recruiter-usage-dashboard

    Bridging batch and stream processing for the Recruiter...

    July 14, 2020

    Co-authors: Khai Tran and Steve Weiss Batch and streaming computations are often combined together in the Lambda architecture, but...

  • top-blogs-logos

    The Top 2019 LinkedIn Engineering Blogs

    December 9, 2019

    As the year draws to a close, we’re taking a look back at ten of our most popular 2019 articles on the LinkedIn Engineering Blog....

  • lag-alert-graphs

    An inside look at LinkedIn’s data pipeline monitoring...

    October 30, 2019

    Co-authors: Krishnan Raman and Joey Salacup Editor's note: This blog has been updated. Monitoring big data pipelines often equates to...