Kafka Articles

  • FastIngest: Low-latency Gobblin with Apache Iceberg and ORC format

    January 6, 2021

    Co-authors: Zihan Li, Sudarshan Vasudevan, Lei Sun, and Shirshanka Das Data analytics and AI power many business-critical use cases at LinkedIn. We need to ingest data in a timely and reliable way from a variety of sources, including Kafka, Oracle, and Espresso, bringing it into our Hadoop data lake for subsequent processing by AI and data science pipelines. We...

  • from-lambda-to-lambdaless-lessons-learned

    From Lambda to Lambda-less: Lessons learned

    December 1, 2020

    Co-authors: Xiang Zhang and Jingyu Zhu Introduction The Lambda architecture has become a popular architectural style that promises both speed and accuracy in data processing by using a hybrid approach of both batch processing and stream processing methods. But it also has some drawbacks, such as complexity and additional development/operational overheads. One of...

  • mock-screenshot-of-the-recruiter-usage-dashboard

    Bridging batch and stream processing for the Recruiter usage statistics dashboard

    July 14, 2020

    Co-authors: Khai Tran and Steve Weiss Batch and streaming computations are often combined together in the Lambda architecture, but carry the cost of maintaining two different code bases for the same logic. We have previously shared on the blog a behind-the-scenes look at our approach into enabling the seamless translation of declarative batch code into streaming...

  • top-blogs-logos

    The Top 2019 LinkedIn Engineering Blogs

    December 9, 2019

    As the year draws to a close, we’re taking a look back at ten of our most popular 2019 articles on the LinkedIn Engineering Blog....

  • lag-alert-graphs

    An inside look at LinkedIn’s data pipeline monitoring...

    October 30, 2019

    Co-authors: Krishnan Raman and Joey Salacup Editor's note: This blog has been updated. Monitoring big data pipelines often equates to...

  • LinkedIn-Kafka-ecosystem

    How LinkedIn customizes Apache Kafka for 7 trillion...

    October 8, 2019

    Co-authors: Jon Lee and Wesley Wu Apache Kafka is a core part of our infrastructure at LinkedIn. It was originally developed in-house...