infrastructure Articles

  • Reducing Apache Spark Application Dependencies Upload by 99%

    March 9, 2023

    Co-authors: Shu Wang, Biao He, and Minchu Yang At LinkedIn, Apache Spark is our primary compute engine for offline data analytics such as data warehousing, data science, machine learning, A/B testing, and metrics reporting. We execute nearly 100,000 Spark applications daily in our Apache Hadoop YARN (more on how we scaled YARN clusters here). These applications...

  • Unifying Messaging Experiences across LinkedIn

    January 26, 2023

    Co-authors: Michele Ursino and Joe Xue Introduction At LinkedIn, we believe that an opportunity can arise from just one conversation, so having reliable and powerful messaging capabilities to enable people to have those meaningful and professional conversations is crucial. Over the years, we have evolved our messaging platform to meet the needs of our 900...

  • Accelerating Code Delivery By 97% With Yarn Workspaces

    December 15, 2022

    As teams and applications experience growth, it’s critical to adopt architectures that optimize for clear code ownership, build isolation, and provide efficient delivery of code. While many projects start small with just one or two repositories (for example, frontend and backend), this approach often becomes difficult to maintain as the codebases expand. At...

  • Co-authors: Kenneth Tay and Xiaofeng Wang At Linkedin, we constantly evaluate the value our products and services deliver, so that we...

  • Originally from Argentina, systems & infrastructure engineering leader Federico was a founding member of the Media Infrastructure team...

  • Migration madness: How to navigate the chaos of large...

    June 27, 2022

    Co-authors: Lily Wittle and Cathy Ji Introduction The Gateway-as-a-Platform (GaaP) team at LinkedIn builds infrastructure to support...