Announcing release of Apache Kafka 0.7.1

August 17, 2012

We are pleased to announce the release of Kafka 0.7.1. This is the second release of Kafka from the Apache incubator. Kafka is a distributed, persistent, high throughput messaging system for collecting and delivering a high volume of data with low latency. LinkedIn and several other companies use Kafka to track extremely large volumes of user-activity events.

In addition to several bug fixes, the 0.7.1 release offers an enhanced consumer API which supports regular expression-based topic consumption. Also new in Kafka 0.7.1 is a dedicated tool (Mirror-Maker) to help easily set up mirrors of a Kafka cluster in remote data-centers. This tool effectively provides a high throughput pipeline in order to maintain real-time mirrors.

These features play a key role in facilitating Kafka's active-active/fail-over solution within LinkedIn. The following figure illustrates one possible multi-data-center Kafka deployment. Each data-center contains two Kafka clusters: a local cluster that receives events from within that data-center, and an aggregate cluster which mirrors both the local cluster and a remote data-center's cluster.

Multi-data-center mirroring pipelines.

This topology enables services in each data-center to consume both local and remote events. Remote events are available for consumption in real-time because the mirror cluster consumes multiple topics in bulk from the source cluster. This is far more efficient and reliable than configuring producers to send individual events to remote data-centers.

This topology also ensures that all the event-tracking data remains available even if a data-center becomes unavailable, since its tracking data is consumed in real-time by the other data-center's aggregate cluster.

The Kafka open source community continues to be very active with bug reports, patch submissions, feature requests and use cases. LinkedIn hosted the first Kafka user group meeting on June 14, 2012 with over 50 attendees. If you missed this event, you can watch the archived video presentation.

While we plan to do 0.7 bug fixes releases as needed, our primary focus in the near-term is the development of the much-anticipated intra-cluster replication feature which will improve both the availability and the durability of tracking data hosted by a Kafka cluster.