Stream Processing Articles

  • table-comparing-the-nexmark-benchmark-results

    Building a better and faster Beam Samza runner

    October 1, 2020

    Co-authors: Yixing Zhang, Bingfeng Xia, Ke Wu, and Xinyu Liu Since Beam Samza runner was developed in 2018 at LinkedIn, we now have 100+ Samza Beam jobs running in production. As our usage grew, we wanted to better understand how the Samza runner performs compared to other runners and identify areas of improvement. In general, for stream processing platforms,...

  • mock-screenshot-of-the-recruiter-usage-dashboard

    Bridging batch and stream processing for the Recruiter usage statistics dashboard

    July 14, 2020

    Co-authors: Khai Tran and Steve Weiss Batch and streaming computations are often combined together in the Lambda architecture, but carry the cost of maintaining two different code bases for the same logic. We have previously shared on the blog a behind-the-scenes look at our approach into enabling the seamless translation of declarative batch code into streaming...

  • change-data-capture

    Open sourcing Brooklin: Near real-time data streaming at scale

    July 16, 2019

    Editor's note: This blog has been updated. Brooklin—a distributed service for streaming data in near real-time and at scale—has been running in production at LinkedIn since 2016, powering thousands of data streams and over 2 trillion messages per day. Today, we are pleased to announce the open-sourcing of Brooklin and that the source code is available in our...

  • setup-that-uses-LXC-to-emulate-a-YARN-cluster

    Using virtual private clusters for testing Apache Samza

    June 20, 2019

    If Apache Kafka is the lifeblood of all nearline processing at LinkedIn, then Apache Samza is the beating heart pumping that blood...

  • calcite1

    Bridging Offline and Nearline Computations with Apache...

    January 29, 2019

    The existing Lambda architecture With the evolution of big data technologies over time, two classes of computations have been...

  • samzalogo

    Samza 1.0: Stream Processing at Massive Scale

    November 27, 2018

    We are pleased to announce today the release of Samza 1.0, a significant milestone in the history of the project. Apache Samza is a...