Apache Kafka Articles

  • cruisecontrol2

    Open Sourcing Kafka Cruise Control

    August 28, 2017

    Apache Kafka's popularity has grown tremendously over the past few years. In fact, LinkedIn's deployment recently surpassed 2 trillion messages per day, with over 1,800 Kafka servers (i.e., brokers). While Kafka has proven to be very stable, there are still operational challenges when running Kafka at such a scale. Brokers fail on a daily basis, which results in...

  • testing_samza1

    Test Strategy for Samza/Kafka Services

    April 27, 2017

    Over a decade ago, test strategies invested heavily in UI-driven tests. Backend and mid-tier services were tested using automated UI-based tests. While UI-based tests have certain merits, such as testing user flows, they are also time-consuming and fragile. The strong coupling of tests with UI caused several problems: Tests needed frequent modification due to...

  • async21

    Asynchronous Processing and Multithreading in Apache Samza, Part II: Experiments and Evaluation

    January 6, 2017

    This post is the second in a series discussing asynchronous processing and multithreading in Apache Samza. In the previous post, we explored the design and architecture of the new AsyncStreamTask API and the asynchronous event loop. In this post, we will focus on the study of the performance of this feature with benchmark Samza jobs. Some of the interesting...

  • Async1

    Asynchronous Processing and Multithreading in Apache Samza,...

    January 4, 2017

    As part of the Apache Samza 0.11 release, we rebuilt Samza’s underlying event processing engine to use an asynchronous and parallel...

  • streamprocess1

    Stream Processing Hard Problems Part II: Data Access

    August 22, 2016

    This post is the second in a series of posts that discuss some of the hard problems in stream processing. In the previous post, we...

  • Announcing Gobblin 0.7.0: Going Beyond Ingestion

    June 29, 2016

    About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced...