Apache Kafka Articles

  • Announcing Gobblin 0.7.0: Going Beyond Ingestion

    June 29, 2016

    About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced by people working on big data problems. We have described how LinkedIn is using Gobblin to ingest data at massive scale from a variety of sources to HDFS, in many previous blog posts, publications, and talks. Today, we are very...

  • Stream Processing Hard Problems – Part 1: Killing Lambda

    June 27, 2016

    We live in an age where we want to know relevant things happening around the world as soon as they happen; an age where digital content is updated instantly based on our likes and dislikes; an age where credit card fraud, security breaches, device malfunctions and site outages need to be detected and remedied as soon as they happen. It is an age where events are...

  • Open Sourcing Kafka Monitor

    May 25, 2016

    Apache Kafka has become a standard messaging system for large-scale, streaming data. In companies like LinkedIn it is used as the backbone for various data pipelines and powers a variety of mission-critical services. It has become a core component of a company’s infrastructure that should be extremely robust, fault-tolerant and performant. In the past, Kafka...

  • Kafkaesque Days at LinkedIn – Part 1

    May 24, 2016

    This is the first post in a blog series adapted from my talk at the inaugural Kafka Summit. Apache Kafka is the backbone for various...

  • How We’re Improving and Advancing Kafka at LinkedIn

    September 2, 2015

    Kafka continues to be one of the key pillars in LinkedIn’s data infrastructure. One of our engineers has described it as LinkedIn’s...

  • Benchmarking Apache Samza: 1.2 million messages per second ...

    August 24, 2015

    Update Apr 13, 2016: There are numerous improvement to Samza cachestore (SAMZA-658, SAMZA-812, SAMZA-873 etc.) since our last test...