Open Source Articles

  • explaining-metadata-architectures

    DataHub: Popular metadata architectures explained

    December 7, 2020

    When I started my journey at LinkedIn ten years ago, the company was just beginning to experience extreme growth in the volume, variety, and velocity of our data. Over the next few years, my colleagues and I in LinkedIn’s data infrastructure team built out foundational technology like Espresso, Databus, and Kafka, among others, to ensure that LinkedIn would...

  • pegasus-data-language

    Pegasus Data Language: Evolving schema definitions for data modeling

    November 19, 2020

    Pegasus Data Schema (PDSC) is a Pegasus schema definition language that has been used for data modeling with Rest.li services for years. It's the underlying language that helps define data models, describe the data returned by REST endpoints, and generate derivative schemas for other uses, such as XML schemas and various database schemas. However, writing PDSC...

  • architecture-diagram-of-magnet

    Magnet: A scalable and performant shuffle architecture for Apache Spark

    October 21, 2020

    Co-authors: Min Shen, Chandni Singh, Ye Zhou, and Sunitha Beeram At LinkedIn, we rely heavily on offline data analytics for data-driven decision making. Over the years, Apache Spark has become the primary compute engine at LinkedIn to satisfy such data needs. With its unique features, Spark empowers many business-critical tasks at LinkedIn, including data...

  • gdmix-logo

    GDMix: A deep ranking personalization framework

    September 29, 2020

    Our logo is inspired by the chameleon: You can enable personalization on your ranking model with GDMix, bringing a personalized...

  • Addressing-bias-in-large-scale-AI-applications-with-lift

    Addressing bias in large-scale AI applications: The...

    August 25, 2020

    Co-authors: Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu At LinkedIn, our imperative is to create economic opportunity for every...

  • detext-title-card

    DeText: A deep NLP framework for intelligent text...

    July 28, 2020

    Co-authors: Weiwei Guo, Xiaowei Liu, Sida Wang, Huiji Gao, and Bo Long Natural language processing (NLP) technologies are widely...