Metadata Articles

  • image-of-schemas-for-a-person

    Shifting left on governance: DataHub and schema annotations

    May 17, 2022

    Co-authors: Joshua Shinavier and Shirshanka Das Data governance is easy… as long as the data to be governed is small and simple. A handful of developers creating a startup company can get away with relatively lightweight solutions for managing their data, but things change as scale and complexity increases. Like a hermit crab outgrowing its shell, we constantly...

  • opal-data-flow

    Opal: Building a mutable dataset in data lake

    March 16, 2022

    Co-authors: Bhupendra Kumar Jain, Aditya Narain Gupta, Kuai Yu, and Hung Tran At LinkedIn, trusted data platforms and quality data pipelines are essential to meaningful business metrics and sound decision-making. Today, a considerable percentage of data at LinkedIn comes from online data stores. Whether the online data systems fall into SQL or NoSQL categories,...

  • explaining-metadata-architectures

    DataHub: Popular metadata architectures explained

    December 7, 2020

    When I started my journey at LinkedIn ten years ago, the company was just beginning to experience extreme growth in the volume, variety, and velocity of our data. Over the next few years, my colleagues and I in LinkedIn’s data infrastructure team built out foundational technology like Espresso, Databus, and Kafka, among others, to ensure that LinkedIn would...

  • metadata-library-updates

    Rapid experimentation through standardization: Typed AI...

    April 15, 2020

    Serving the most relevant information for LinkedIn members in the homepage feed requires a massive effort—hundreds of features are...

  • datahub-logo

    Open sourcing DataHub: LinkedIn’s metadata search and...

    February 18, 2020

    Co-authors: Kerem Sahin, Mars Lan, and Shirshanka Das Finding the right data quickly is critical for any company that relies on big...

  • data-hub-logo

    DataHub: A generalized metadata search & discovery tool

    August 14, 2019

    Co-authors: Mars Lan, Seyi Adebajo, Shirshanka Das Editor’s note: Since publishing this blog post, the team open sourced DataHub in...