About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced by people working on big data problems. We have described how LinkedIn is using Gobblin to ingest data at massive scale from a variety of sources to HDFS, in many previous blog posts, publications, and talks. Today, we are very...
Data Ingestion Articles
-
- Topics:
- Hadoop,
- Big Data,
- Open Source,
- Data Ingestion,
- Distributed Systems,
- ETL,
- Gobblin,
- Kafka
-
We shared Gobblin with the open source community a year ago. Since then, we’ve seen increasing interest and adoption among engineers, researchers and analysts in using Gobblin to integrate data from a variety of sources into Hadoop. In previous blog posts, publications, and talks, we’ve described our motivations for building a unified ingestion framework that is...
- Topics:
- Hadoop,
- Data Ingestion,
- Gobblin,
- Kafka,
- Open Source
-
Authors: Shirshanka Das, Lin Qiao The holiday season for gobbling is upon us; and at LinkedIn, we’ve been getting better at gobbling large amounts of different kinds of datasets to feed our data hungry analysts. We thought we’d share with you our most recent efforts at simplifying big data ingestion for Hadoop-based warehouses. Got Data? Every day, the LinkedIn...
- Topics:
- Big Data,
- Hadoop,
- Data Ingestion,
- Distributed Systems,
- Gobblin,
- Open Source