infrastructure Articles

  • image-of-pro-ml-workspace-at-linkedin

    One-stop MLOps portal at LinkedIn

    May 19, 2022

    Co-authors: Eing Ong, Shannon Bain, and Daniel Qiu What is MLOps? Before we dive into our MLOps portal, let’s begin by defining MLOps (Machine Learning Operations). MLOps is about continuously running ML correctly by managing the full lifecycle (developing, improving, and maintaining) for AI models. A structured and methodical approach that starts at problem...

  • diagram-of-http2-network-client-architecture

    HTTP/2 in infrastructure: Ambry network stack refactoring

    August 24, 2021

    Co-authors: Ze Mao, Matt Wise, Casey Getz, Justin Lin, Ashish Singhai, and Rob Block Introduction Ambry is LinkedIn's scalable geo-distributed object store. Developed in-house and open sourced in 2016, Ambry stores tens of petabytes of data. At LinkedIn, Ambry is used to store objects like photos, videos, and resume uploads, as well as internal binary data....

  • gif-showing-new-recruiter-and-jobs-experience

    New Recruiter & Jobs: The largest enterprise data migration at LinkedIn

    July 23, 2021

    Co-authors: Xiaoyang Gu, Xie Lu, and Xiaoguang Wang Introduction In August 2019, we introduced our members and customers to the idea of moving LinkedIn’s two core talent products—Jobs and Recruiter—onto a single platform to help talent professionals be even more productive. This single platform is called the New Recruiter & Jobs. Figure 1: New Recruiter & Jobs...

  • chart-showing-exponential-growth-of-data-metadata-and-compute-on-linkedins-largest-hadoop-cluster

    The exabyte club: LinkedIn’s journey of scaling the Hadoop ...

    May 27, 2021

    Co-authors: Konstantin V. Shvachko, Chen Liang, and Simbarashe Dzinamarira LinkedIn runs its big data analytics on Hadoop. During the...

  • host-wise-latency-to-detect-outliers-and-single-node-failures-this-graph-shows-four-outliers-from-three-hosts

    Rethinking site capacity projections with Capacity Analyzer

    March 16, 2021

    While site outages are inevitable, it’s our job to minimize both the duration of outages and the likelihood for an outage to occur....

  • diagram-for-variant-assignment-for-a-sample-population

    A/B testing at LinkedIn: Assigning variants at scale

    December 16, 2020

    Co-authors: Alexander Ivaniuk and Weitao Duan Editor’s note: This blog post is the second in a series providing an overview and...