Open Source Articles

  • architecture-diagram-of-magnet

    Magnet: A scalable and performant shuffle architecture for Apache Spark

    October 21, 2020

    Co-authors: Min Shen, Chandni Singh, Ye Zhou, and Sunitha Beeram At LinkedIn, we rely heavily on offline data analytics for data-driven decision making. Over the years, Apache Spark has become the primary compute engine at LinkedIn to satisfy such data needs. With its unique features, Spark empowers many business-critical tasks at LinkedIn, including data...

  • gdmix-logo

    GDMix: A deep ranking personalization framework

    September 29, 2020

    Our logo is inspired by the chameleon: You can enable personalization on your ranking model with GDMix, bringing a personalized experience to every user, like a chameleon that can match its surroundings. Co-authors: Jun Shi, Chengming Jiang, Aman Gupta, Mingzhou Zhou, Alice Wu, Yunbo Ouyang, Jun Jia, Huiji Gao, and Bo Long Millions of members come to LinkedIn...

  • Addressing-bias-in-large-scale-AI-applications-with-lift

    Addressing bias in large-scale AI applications: The LinkedIn Fairness Toolkit

    August 25, 2020

    Co-authors: Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu At LinkedIn, our imperative is to create economic opportunity for every member of the global workforce, something that would be impossible to accomplish without leveraging AI at scale. We help members and customers make decisions by providing them with the most relevant insights based on the available...

  • detext-title-card

    DeText: A deep NLP framework for intelligent text...

    July 28, 2020

    Co-authors: Weiwei Guo, Xiaowei Liu, Sida Wang, Huiji Gao, and Bo Long Natural language processing (NLP) technologies are widely...

  • diagram-of-kube2hadoop-authentication-mechanism

    Open sourcing Kube2Hadoop: Secure access to HDFS from...

    June 10, 2020

    Co-authors: Cong Gu, Abin Shahab, Chen Qiang, and Keqiu Hu Editor's note: This blog has been updated. LinkedIn AI has been...

  • buliding-blocks-of-spark-tf-record

    Spark-TFRecord: Toward full support of TFRecord in Spark

    May 4, 2020

    Co-authors: Jun Shi, Mingzhou Zhou Introduction In the machine learning community, Apache Spark is widely used for data processing due...