I wrote a blog post about how LinkedIn uses Apache Kafka as a central publish-subscribe log for integrating data between applications, stream processing, and Hadoop data ingestion. To actually make this work, though, this "universal log" has to be a cheap abstraction. If you want to use a system as a central data hub it has to be fast, predictable, and easy to...
Posts by Jay Kreps
-
- Topics:
- Performance,
- Kafka,
- Distributed Systems
-
The Log: What every software engineer should know about real-time data's unifying abstraction
Jay Kreps December 16, 2013
I joined LinkedIn about six years ago at a particularly interesting time. We were just beginning to run up against the limits of our monolithic, centralized database and needed to start the transition to a portfolio of specialized distributed systems. This has been an interesting experience: we built, deployed, and run to this day a distributed graph database, a...
- Topics:
- Logs,
- Stream Processing,
- Hadoop,
- Data,
- Distributed Systems,
- ETL,
- Kafka