Hadoop Articles

  • graph-of-linkedin-cluster-trends-for-hdfs-space-used-total-name-node-objects-and-yarn-compute-capacity

    Scaling LinkedIn's Hadoop YARN cluster beyond 10,000 nodes

    September 8, 2021

    Co-authors: Keqiu Hu, Jonathan Hung, Haibo Chen, and Sriram Rao At LinkedIn, we use Hadoop as our backbone for big data analytics and machine learning. With an exponentially growing data volume, and the company heavily investing in machine learning and data science, we have been doubling our cluster size year over year to match the compute workload growth. Our...

  • diagram-of-how-tony-works-with-horovod

    TonY joins LF AI & Data Foundation

    July 15, 2021

    Co-authors: Keqiu Hu, Jonathan Hung, and Junfan Zhang Today, TonY is joining the LF AI & Data Foundation, an umbrella foundation of the Linux Foundation supporting open source innovation in artificial intelligence, machine learning, and deep learning. “We’re thrilled to welcome TonY into incubation in LF AI & Data. The project offers functionalities that are not...

  • chart-showing-exponential-growth-of-data-metadata-and-compute-on-linkedins-largest-hadoop-cluster

    The exabyte club: LinkedIn’s journey of scaling the Hadoop Distributed File System

    May 27, 2021

    Co-authors: Konstantin V. Shvachko, Chen Liang, and Simbarashe Dzinamarira LinkedIn runs its big data analytics on Hadoop. During the last five years, the analytics infrastructure has experienced tremendous growth, almost doubling every year in data size, compute workloads, and in all other dimensions. It recently reached two important milestones. LinkedIn now...

  • FastIngest: Low-latency Gobblin with Apache Iceberg and...

    January 6, 2021

    Co-authors: Zihan Li, Sudarshan Vasudevan, Lei Sun, and Shirshanka Das Data analytics and AI power many business-critical use cases at...

  • diagram-showing-hadoop-dual-ingest-pipelines

    Theory vs. Practice: Learnings from a recent Hadoop incident

    August 6, 2020

    Co-authors: Sandhya Ramu and Vasanth Rajamani For companies and organizations, failure tends to be far more illuminating than success...

  • diagram-of-kube2hadoop-authentication-mechanism

    Open sourcing Kube2Hadoop: Secure access to HDFS from...

    June 10, 2020

    Co-authors: Cong Gu, Abin Shahab, Chen Qiang, and Keqiu Hu Editor's note: This blog has been updated. LinkedIn AI has been...