Open Source Articles

  • detext-title-card

    DeText: A deep NLP framework for intelligent text understanding

    July 28, 2020

    Co-authors: Weiwei Guo, Xiaowei Liu, Sida Wang, Huiji Gao, and Bo Long Natural language processing (NLP) technologies are widely deployed to process rich natural language text data for search and recommender systems. Achieving high-quality search and recommendation results requires that information, such as user queries and documents, be processed and understood...

  • diagram-of-kube2hadoop-authentication-mechanism

    Open sourcing Kube2Hadoop: Secure access to HDFS from Kubernetes

    June 10, 2020

    Co-authors: Cong Gu, Abin Shahab, Chen Qiang, Keqiu Hu Editor's note: This post was updated on June 10, 2020 LinkedIn AI has been traditionally Hadoop/YARN based, and we operate one of the world’s largest Hadoop data lakes, with over 4,500 users and 500PB of data. In the last few years, Kubernetes has also become very popular at LinkedIn for Artificial...

  • buliding-blocks-of-spark-tf-record

    Spark-TFRecord: Toward full support of TFRecord in Spark

    May 4, 2020

    Co-authors: Jun Shi, Mingzhou Zhou Introduction In the machine learning community, Apache Spark is widely used for data processing due to its efficiency in SQL-style operations, while TensorFlow is one of the most popular frameworks for model training. Although there are some data formats supported by both tools, TFRecord—the data format native to TensorFlow—is...

  • apache-pinot-update

    Introducing Apache Pinot 0.3.0

    April 27, 2020

    Built at LinkedIn, Pinot is an open source, distributed, and scalable OLAP data store that we use as our de-facto near-real-time...

  • datahub-logo

    Open sourcing DataHub: LinkedIn’s metadata search and...

    February 18, 2020

    Co-authors: Kerem Sahin, Mars Lan, and Shirshanka Das Finding the right data quickly is critical for any company that relies on big...

  • how-we-retired-python-2

    How we retired Python 2 and improved developer happiness

    January 29, 2020

    Nearly 20 years after the first release of Python 2 and 11 years after the first release of Python 3, the Python development community...