Co-authors: Bhupendra Kumar Jain, Aditya Narain Gupta, Kuai Yu, and Hung Tran At LinkedIn, trusted data platforms and quality data pipelines are essential to meaningful business metrics and sound decision-making. Today, a considerable percentage of data at LinkedIn comes from online data stores. Whether the online data systems fall into SQL or NoSQL categories,...
Gobblin Articles
-
Co-authors: Chris Li, Kevin Lau, and Subbu Sanka Editor’s Note: Recently, the Apache Software Foundation (ASF) announced Apache® Gobblin™ as a Top-Level Project (TLP). For more information, visit https://gobblin.apache.org/ and https://twitter.com/ApacheGobblin. Introduction Our big data ecosystem is larger than 1 exabyte and growing, while ingesting and...
- Topics:
- Gobblin,
- Big Data,
- Open Source
-
Co-authors: Zihan Li, Sudarshan Vasudevan, Lei Sun, and Shirshanka Das Data analytics and AI power many business-critical use cases at LinkedIn. We need to ingest data in a timely and reliable way from a variety of sources, including Kafka, Oracle, and Espresso, bringing it into our Hadoop data lake for subsequent processing by AI and data science pipelines. We...
- Topics:
- Stream Processing,
- Hadoop,
- Data,
- batch processing,
- Open Source,
- Gobblin,
- Kafka
-
Co-authors: Khai Tran and Steve Weiss Batch and streaming computations are often combined together in the Lambda architecture, but...
- Topics:
- Stream Processing,
- batch processing,
- Data,
- Pinot,
- Gobblin,
- Kafka,
- Samza
-
Co-authors: Krishnan Raman and Joey Salacup Editor's note: This blog has been updated. Monitoring big data pipelines often equates to...
- Topics:
- infrastructure,
- Gobblin,
- tools,
- Kafka,
- Data
-
Gobblin is a distributed data integration framework that simplifies common aspects of big data integration, such as ingestion,...
- Topics:
- Stream Processing,
- Gobblin,
- Data,
- Open Source