Apache Samza Articles

  • design-of-real-time-personalization-solution

    Near real-time features for near real-time personalization

    March 1, 2022

    Co-authors: Rupesh Gupta, Sasha Ovsankin, Qing Li, Seunghyun Lee, Benjamin Le, and Sunil Khanal At LinkedIn, we strive to serve the most relevant recommendations to our members, whether that’s a job they may be interested in, a member they may want to connect with, or another type of suggestion. In order to do that, we need to know their intent and preferences,...

  • table-comparing-the-nexmark-benchmark-results

    Building a better and faster Beam Samza runner

    October 1, 2020

    Co-authors: Yixing Zhang, Bingfeng Xia, Ke Wu, and Xinyu Liu Since Beam Samza runner was developed in 2018 at LinkedIn, we now have 100+ Samza Beam jobs running in production. As our usage grew, we wanted to better understand how the Samza runner performs compared to other runners and identify areas of improvement. In general, for stream processing platforms,...

  • testing_samza1

    Test Strategy for Samza/Kafka Services

    April 27, 2017

    Over a decade ago, test strategies invested heavily in UI-driven tests. Backend and mid-tier services were tested using automated UI-based tests. While UI-based tests have certain merits, such as testing user flows, they are also time-consuming and fragile. The strong coupling of tests with UI caused several problems: Tests needed frequent modification due to...

  • async21

    Asynchronous Processing and Multithreading in Apache Samza,...

    January 6, 2017

    This post is the second in a series discussing asynchronous processing and multithreading in Apache Samza. In the previous post, we...

  • Async1

    Asynchronous Processing and Multithreading in Apache Samza,...

    January 4, 2017

    As part of the Apache Samza 0.11 release, we rebuilt Samza’s underlying event processing engine to use an asynchronous and parallel...

  • streamprocess1

    Stream Processing Hard Problems Part II: Data Access

    August 22, 2016

    This post is the second in a series of posts that discuss some of the hard problems in stream processing. In the previous post, we...