Stream Processing Articles

  • calcite1

    Bridging Offline and Nearline Computations with Apache Calcite

    January 29, 2019

    The existing Lambda architecture With the evolution of big data technologies over time, two classes of computations have been developed for processing large-scale datasets: batch and streaming. Batch computation was developed for processing historical data, and batch engines, like Apache Hadoop or Apache Spark, are often designed to provide correct and complete,...

  • samzalogo

    Samza 1.0: Stream Processing at Massive Scale

    November 27, 2018

    We are pleased to announce today the release of Samza 1.0, a significant milestone in the history of the project. Apache Samza is a distributed stream processing framework that we developed at LinkedIn in 2013. Samza became a top-level Apache project in 2014. Fast-forward to 2018, and we currently have over 3,000 applications in production leveraging Samza at...

  • unstructureddata1

    Unstructured Data Transfer in Rest.li

    November 2, 2018

    A few years ago, we announced Rest.li 2.x and a Protocol Upgrade Story. Today, we are excited to share another major milestone: the release of Unstructured Data Reactive! The source code is available on GitHub and documented in the user guide. Give it a try and let us know what you think.   Why Rest.li? Rest.li is an open source REST framework for building...

  • gobblinlogo1

    Gobblin Enters Apache Incubation

    January 17, 2018

    Gobblin is a distributed data integration framework that simplifies common aspects of big data integration, such as ingestion,...

  • venice1

    Venice Hybrid: Doing Lambda Better

    December 20, 2017

    Over the last two years at LinkedIn, I’ve been working on a distributed key-value database called “Venice.” Venice is designed to be a...

  • Incremental-Data-Capture-2

    Incremental Data Capture for Oracle Databases at LinkedIn: ...

    November 22, 2017

    Co-authors: Saurabh Goyal and Janardh Bantupalli In our previous blog post introducing Brooklin, we outlined the reasons why we...