• title-card

    Project Magnet, providing push-based shuffle, now available in Apache Spark 3.2

    October 20, 2021

    Co-authors: Venkata Krishnan Sowrirajan and Min Shen We are excited to announce that push-based shuffle (codenamed Project Magnet) is now available in Apache Spark as part of the 3.2 release. Since the SPIP vote on Project Magnet passed in September 2020, there has been a lot of interest in getting it into Apache Spark. As of March 2021, 100% of LinkedIn’s Spark...

  • title-card

    Our approach to building transparent and explainable AI systems

    October 7, 2021

    Co-authors: Parvez Ahammad, Kinjal Basu, Yazhou Cao, Shaunak Chatterjee, David Durfee, Sakshi Jain, Nihar Mehta, Varun Mithal, and Jilei Yang Delivering the best member and customer experiences with a focus on trust is core to everything that we do at LinkedIn. As we continue to build on our Responsible AI program that we recently outlined three months ago, a...

  • an-illustration-of-the-distributed-tier-merge

    Distributed tier merge: How LinkedIn tackles stragglers in search index build

    September 27, 2021

    Co-authors: Andy Li and Hongbin Wu Indexing plays the key role in modern search engines for fast and accurate information retrieval, and the ability to swiftly build indexes is crucial for LinkedIn to provide up to date information, such as candidates to recruiters, job posts to members, etc. In some instances, such as if a member profile is missing and...

  • graph-of-linkedin-cluster-trends-for-hdfs-space-used-total-name-node-objects-and-yarn-compute-capacity

    Scaling LinkedIn's Hadoop YARN cluster beyond 10,000 nodes

    September 8, 2021

    Co-authors: Keqiu Hu, Jonathan Hung, Haibo Chen, and Sriram Rao At LinkedIn, we use Hadoop as our backbone for big data analytics and...

  •  encoded-activity-sequence-showing-requests-made-by-a-member-that-was-not-using-abusive-automation

    Using deep learning to detect abusive sequences of member...

    September 2, 2021

    Co-authors: James Verbus and Beibei Wang The Anti-Abuse AI Team at LinkedIn creates, deploys, and maintains models that detect and...

  • diagram-of-http2-network-client-architecture

    HTTP/2 in infrastructure: Ambry network stack refactoring

    August 24, 2021

    Co-authors: Ze Mao, Matt Wise, Casey Getz, Justin Lin, Ashish Singhai, and Rob Block Introduction Ambry is LinkedIn's scalable...