About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced by people working on big data problems. We have described how LinkedIn is using Gobblin to ingest data at massive scale from a variety of sources to HDFS, in many previous blog posts, publications, and talks. Today, we are very...
ETL Articles
-
- Topics:
- Hadoop,
- Big Data,
- Open Source,
- Data Ingestion,
- Distributed Systems,
- ETL,
- Gobblin,
- Kafka
-
Genesis Less than a year ago, we introduced Gobblin, a unified ingestion framework, to the world of Big Data. Since then, we’ve shared ongoing progress through a talk at Hadoop Summit and a paper at VLDB. Today, we’re announcing the open source release of Gobblin 0.5.0, a big milestone that includes Apache Kafka integration. Our motivations for building Gobblin...
- Topics:
- Big Data,
- Hadoop,
- Open Source,
- Distributed Systems,
- ETL,
- Gobblin,
- Kafka
-
I joined LinkedIn about six years ago at a particularly interesting time. We were just beginning to run up against the limits of our monolithic, centralized database and needed to start the transition to a portfolio of specialized distributed systems. This has been an interesting experience: we built, deployed, and run to this day a distributed graph database, a...
- Topics:
- Logs,
- Stream Processing,
- Hadoop,
- Data,
- Distributed Systems,
- ETL,
- Kafka