Open Source Articles

  • pegasus-data-language

    Pegasus Data Language: Evolving schema definitions for data modeling

    November 19, 2020

    Pegasus Data Schema (PDSC) is a Pegasus schema definition language that has been used for data modeling with Rest.li services for years. It's the underlying language that helps define data models, describe the data returned by REST endpoints, and generate derivative schemas for other uses, such as XML schemas and various database schemas. However, writing PDSC...

  • architecture-diagram-of-magnet

    Magnet: A scalable and performant shuffle architecture for Apache Spark

    October 21, 2020

    Co-authors: Min Shen, Chandni Singh, Ye Zhou, and Sunitha Beeram At LinkedIn, we rely heavily on offline data analytics for data-driven decision making. Over the years, Apache Spark has become the primary compute engine at LinkedIn to satisfy such data needs. With its unique features, Spark empowers many business-critical tasks at LinkedIn, including data...

  • gdmix-logo

    GDMix: A deep ranking personalization framework

    September 29, 2020

    Our logo is inspired by the chameleon: You can enable personalization on your ranking model with GDMix, bringing a personalized experience to every user, like a chameleon that can match its surroundings. Co-authors: Jun Shi, Chengming Jiang, Aman Gupta, Mingzhou Zhou, Alice Wu, Yunbo Ouyang, Jun Jia, Huiji Gao, and Bo Long Millions of members come to LinkedIn...

  • Addressing-bias-in-large-scale-AI-applications-with-lift

    Addressing bias in large-scale AI applications: The...

    August 25, 2020

    Co-authors: Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu At LinkedIn, our imperative is to create economic opportunity for every...

  • detext-title-card

    DeText: A deep NLP framework for intelligent text...

    July 28, 2020

    Co-authors: Weiwei Guo, Xiaowei Liu, Sida Wang, Huiji Gao, and Bo Long Natural language processing (NLP) technologies are widely...

  • diagram-of-kube2hadoop-authentication-mechanism

    Open sourcing Kube2Hadoop: Secure access to HDFS from...

    June 10, 2020

    Co-authors: Cong Gu, Abin Shahab, Chen Qiang, and Keqiu Hu Editor's note: This blog has been updated. LinkedIn AI has been...