Open Source Articles

  • buliding-blocks-of-spark-tf-record

    Spark-TFRecord: Toward full support of TFRecord in Spark

    May 4, 2020

    Co-authors: Jun Shi, Mingzhou Zhou Introduction In the machine learning community, Apache Spark is widely used for data processing due to its efficiency in SQL-style operations, while TensorFlow is one of the most popular frameworks for model training. Although there are some data formats supported by both tools, TFRecord—the data format native to TensorFlow—is...

  • apache-pinot-update

    Introducing Apache Pinot 0.3.0

    April 27, 2020

    Built at LinkedIn, Pinot is an open source, distributed, and scalable OLAP data store that we use as our de-facto near-real-time analytics service. We’ve previously discussed how and why we built Pinot to power a wide spectrum of use cases, including internal business intelligence dashboards to analyze highly-dimensional data and “Who Viewed My Profile” to...

  • datahub-logo

    Open sourcing DataHub: LinkedIn’s metadata search and discovery platform

    February 18, 2020

    Co-authors: Kerem Sahin, Mars Lan, and Shirshanka Das Finding the right data quickly is critical for any company that relies on big data insights to make data-driven decisions. Not only does this impact the productivity of data users (including analysts, machine learning developers, data scientists, and data engineers), but it also has a direct impact on end...

  • how-we-retired-python-2

    How we retired Python 2 and improved developer happiness

    January 29, 2020

    Nearly 20 years after the first release of Python 2 and 11 years after the first release of Python 3, the Python development community...

  • lightweight-hardware-accelerated-video/audio-transcoder-for-android

    LiTr: A lightweight video/audio transcoder for Android

    December 19, 2019

    If a picture’s worth a thousand words, then what about a video? In 2017, we launched video sharing to give our members the ability to...

  • LinkedIn-Kafka-ecosystem

    How LinkedIn customizes Apache Kafka for 7 trillion...

    October 8, 2019

    Co-authors: Jon Lee and Wesley Wu Apache Kafka is a core part of our infrastructure at LinkedIn. It was originally developed in-house...