Gobblin Articles

  • diagram-illustrating-how-data-integration-library-provides-a-small-number-of-connectors-supporting-transfer-protocols-that-cover-the-vast-majority-of-the-use-cases

    Solving the data integration variety problem at scale, with Gobblin

    February 24, 2021

    Co-authors: Chris Li, Kevin Lau, and Subbu Sanka Editor’s Note: Recently, the Apache Software Foundation (ASF) announced Apache® Gobblin™ as a Top-Level Project (TLP). For more information, visit https://gobblin.apache.org/ and https://twitter.com/ApacheGobblin. Introduction Our big data ecosystem is larger than 1 exabyte and growing, while ingesting and...

  • FastIngest: Low-latency Gobblin with Apache Iceberg and ORC format

    January 6, 2021

    Co-authors: Zihan Li, Sudarshan Vasudevan, Lei Sun, and Shirshanka Das Data analytics and AI power many business-critical use cases at LinkedIn. We need to ingest data in a timely and reliable way from a variety of sources, including Kafka, Oracle, and Espresso, bringing it into our Hadoop data lake for subsequent processing by AI and data science pipelines. We...

  • mock-screenshot-of-the-recruiter-usage-dashboard

    Bridging batch and stream processing for the Recruiter usage statistics dashboard

    July 14, 2020

    Co-authors: Khai Tran and Steve Weiss Batch and streaming computations are often combined together in the Lambda architecture, but carry the cost of maintaining two different code bases for the same logic. We have previously shared on the blog a behind-the-scenes look at our approach into enabling the seamless translation of declarative batch code into streaming...

  • lag-alert-graphs

    An inside look at LinkedIn’s data pipeline monitoring...

    October 30, 2019

    Co-authors: Krishnan Raman and Joey Salacup Editor's note: This blog has been updated. Monitoring big data pipelines often equates to...

  • gobblinlogo1

    Gobblin Enters Apache Incubation

    January 17, 2018

    Gobblin is a distributed data integration framework that simplifies common aspects of big data integration, such as ingestion,...

  • Announcing Gobblin 0.7.0: Going Beyond Ingestion

    June 29, 2016

    About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced...