Hadoop Articles

  • FastIngest: Low-latency Gobblin with Apache Iceberg and ORC format

    January 6, 2021

    Co-authors: Zihan Li, Sudarshan Vasudevan, Lei Sun, and Shirshanka Das Data analytics and AI power many business-critical use cases at LinkedIn. We need to ingest data in a timely and reliable way from a variety of sources, including Kafka, Oracle, and Espresso, bringing it into our Hadoop data lake for subsequent processing by AI and data science pipelines. We...

  • diagram-showing-hadoop-dual-ingest-pipelines

    Theory vs. Practice: Learnings from a recent Hadoop incident

    August 6, 2020

    Co-authors: Sandhya Ramu and Vasanth Rajamani For companies and organizations, failure tends to be far more illuminating than success and the lingering effects of a failure can be harmful if the team moves too quickly and does not resolve the issue in a thorough and transparent manner. We recently ran into a large incident that involved data loss in our big data...

  • diagram-of-kube2hadoop-authentication-mechanism

    Open sourcing Kube2Hadoop: Secure access to HDFS from Kubernetes

    June 10, 2020

    Co-authors: Cong Gu, Abin Shahab, Chen Qiang, and Keqiu Hu Editor's note: This blog has been updated. LinkedIn AI has been traditionally Hadoop/YARN based, and we operate one of the world’s largest Hadoop data lakes, with over 4,500 users and 500PB of data. In the last few years, Kubernetes has also become very popular at LinkedIn for Artificial Intelligence (AI...

  • Skill-Assessments-Example

    The building blocks of LinkedIn Skill Assessments

    September 17, 2019

    Co-authors: Christian Mathiesen and Jie Zhang Your LinkedIn profile is intended to be a representative picture of your professional...

  • hadoopmeetup1

    The Present and Future of Apache Hadoop: A Community Meetup...

    February 21, 2019

    On January 30, Hadoop developers gathered at LinkedIn’s offices in Mountain View to share their latest work, with presentations by...

  • tony1.jpg

    Open Sourcing TonY: Native Support of TensorFlow on Hadoop

    September 12, 2018

    Co-authors: Jonathan Hung, Keqiu Hu, and Anthony Hsu LinkedIn heavily relies on artificial intelligence to deliver content and create...