Open Source Articles

  • Addressing-bias-in-large-scale-AI-applications-with-lift

    Addressing bias in large-scale AI applications: The LinkedIn Fairness Toolkit

    August 25, 2020

    Co-authors: Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu At LinkedIn, our imperative is to create economic opportunity for every member of the global workforce, something that would be impossible to accomplish without leveraging AI at scale. We help members and customers make decisions by providing them with the most relevant insights based on the available...

  • detext-title-card

    DeText: A deep NLP framework for intelligent text understanding

    July 28, 2020

    Co-authors: Weiwei Guo, Xiaowei Liu, Sida Wang, Huiji Gao, and Bo Long Natural language processing (NLP) technologies are widely deployed to process rich natural language text data for search and recommender systems. Achieving high-quality search and recommendation results requires that information, such as user queries and documents, be processed and understood...

  • diagram-of-kube2hadoop-authentication-mechanism

    Open sourcing Kube2Hadoop: Secure access to HDFS from Kubernetes

    June 10, 2020

    Co-authors: Cong Gu, Abin Shahab, Chen Qiang, Keqiu Hu Editor's note: This post was updated on June 10, 2020 LinkedIn AI has been traditionally Hadoop/YARN based, and we operate one of the world’s largest Hadoop data lakes, with over 4,500 users and 500PB of data. In the last few years, Kubernetes has also become very popular at LinkedIn for Artificial...

  • buliding-blocks-of-spark-tf-record

    Spark-TFRecord: Toward full support of TFRecord in Spark

    May 4, 2020

    Co-authors: Jun Shi, Mingzhou Zhou Introduction In the machine learning community, Apache Spark is widely used for data processing due...

  • apache-pinot-update

    Introducing Apache Pinot 0.3.0

    April 27, 2020

    Built at LinkedIn, Pinot is an open source, distributed, and scalable OLAP data store that we use as our de-facto near-real-time...

  • datahub-logo

    Open sourcing DataHub: LinkedIn’s metadata search and...

    February 18, 2020

    Co-authors: Kerem Sahin, Mars Lan, and Shirshanka Das Finding the right data quickly is critical for any company that relies on big...