• architecture-diagram-of-magnet

    Magnet: A scalable and performant shuffle architecture for Apache Spark

    October 21, 2020

    Co-authors: Min Shen, Chandni Singh, Ye Zhou, and Sunitha Beeram At LinkedIn, we rely heavily on offline data analytics for data-driven decision making. Over the years, Apache Spark has become the primary compute engine at LinkedIn to satisfy such data needs. With its unique features, Spark empowers many business-critical tasks at LinkedIn, including data...

  • fixing-linux-file-system-performance-regressions

    Fixing Linux filesystem performance regressions

    October 16, 2020

    As companies grow, adapt, morph, and mature, one item remains the same: the need for reinvention. Technical infrastructure is no exception. As our member community grew, our priorities were to keep up with that growth, or as we say, ensure continuous “site up.” (Read: adding servers to scale from hundreds to hundreds of thousands.) We ran into challenges about...

  • pensieve-an-embedding-feature-platform

    Pensieve: An embedding feature platform

    October 14, 2020

    Co-authors: Benjamin Le, Daniel Gmach, Aman Grover, Roshan Lal, Jerry Lin, Austin Lu, Qingyun Wan, and Leighton Zhang Feature engineering is foundational for building artificial intelligence (AI) that powers products at LinkedIn. Recently, “representation learning” or “feature learning” has started replacing manually engineered features, as they provide...

  • sketching-out-what-a-heterogeneous-social-network-looks-like

    Building a heterogeneous social network recommendation...

    October 6, 2020

    Co-authors: Parag Agrawal, Ankan Saha, Yafei Wang, Aastha Nigam, and Eric Lawrence Figure 1: A heterogeneous social network LinkedIn’s...

  • table-comparing-the-nexmark-benchmark-results

    Building a better and faster Beam Samza runner

    October 1, 2020

    Co-authors: Yixing Zhang, Bingfeng Xia, Ke Wu, and Xinyu Liu Since Beam Samza runner was developed in 2018 at LinkedIn, we now have...

  • data-science-week-word-cloud

    Celebrating Innovation, Success, and the Future at Data...

    September 30, 2020

    Last week, the global LinkedIn Data Science team joined together for our third-annual Data Science Week. This virtual event included...