Co-authors: Zihan Li, Sudarshan Vasudevan, Lei Sun, and Shirshanka Das Data analytics and AI power many business-critical use cases at LinkedIn. We need to ingest data in a timely and reliable way from a variety of sources, including Kafka, Oracle, and Espresso, bringing it into our Hadoop data lake for subsequent processing by AI and data science pipelines. We...
Gobblin Articles
-
- Topics:
- Stream Processing,
- Hadoop,
- Data,
- batch processing,
- Open Source,
- Gobblin,
- Kafka
-
Co-authors: Khai Tran and Steve Weiss Batch and streaming computations are often combined together in the Lambda architecture, but carry the cost of maintaining two different code bases for the same logic. We have previously shared on the blog a behind-the-scenes look at our approach into enabling the seamless translation of declarative batch code into streaming...
- Topics:
- Stream Processing,
- batch processing,
- Data,
- Pinot,
- Gobblin,
- Kafka,
- Samza
-
Co-authors: Krishnan Raman and Joey Salacup Editor's note: This blog has been updated. Monitoring big data pipelines often equates to waiting for a long-running batch job to complete and observing the status of the execution. The status can result in “Failed” or “Successful” or even “Incomplete.” From there, it’s the team’s job to understand the impact and...
- Topics:
- infrastructure,
- Gobblin,
- tools,
- Kafka,
- Data
-
Gobblin is a distributed data integration framework that simplifies common aspects of big data integration, such as ingestion,...
- Topics:
- Stream Processing,
- Gobblin,
- Data,
- Open Source
-
About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced...
- Topics:
- Hadoop,
- Big Data,
- Open Source,
- Data Ingestion,
- Distributed Systems,
- ETL,
- Gobblin,
- Kafka
-
We shared Gobblin with the open source community a year ago. Since then, we’ve seen increasing interest and adoption among engineers,...
- Topics:
- Hadoop,
- Data Ingestion,
- Gobblin,
- Kafka,
- Open Source