Data Articles

  • Towards data quality management at LinkedIn

    June 9, 2022

    Co-authors: Liangzhao Zeng, Ting Yu (Cliff) Leung, Jimmy Hong, and Kevin Lau Introduction Data is at the heart of all our products and decisions at LinkedIn and the quality of our data is vital to our success. While not an uncommon problem, our scale, hundreds of thousands of pipelines and streams as well as over an exabyte of data in our data lake alone,...

  • image-of-schemas-for-a-person

    Shifting left on governance: DataHub and schema annotations

    May 17, 2022

    Co-authors: Joshua Shinavier and Shirshanka Das Data governance is easy… as long as the data to be governed is small and simple. A handful of developers creating a startup company can get away with relatively lightweight solutions for managing their data, but things change as scale and complexity increases. Like a hermit crab outgrowing its shell, we constantly...

  • opal-data-flow

    Opal: Building a mutable dataset in data lake

    March 16, 2022

    Co-authors: Bhupendra Kumar Jain, Aditya Narain Gupta, Kuai Yu, and Hung Tran At LinkedIn, trusted data platforms and quality data pipelines are essential to meaningful business metrics and sound decision-making. Today, a considerable percentage of data at LinkedIn comes from online data stores. Whether the online data systems fall into SQL or NoSQL categories,...

  • diagram-of-darwin-functionality

    DARWIN: Data Science and Artificial Intelligence Workbench ...

    January 28, 2022

    Co-authors: Varun Saxena, Harikumar Velayutham, and Balamurugan Gangadharan LinkedIn is the largest global professional network and...

  • photo-of-juan-in-argentina

    After joining LinkedIn Argentina, Juan took an Ireland-based opportunity to build a new EMEA (i.e., Europe, Middle East, and Africa)...

  • high-level-diagram-of-user-migration-and-dataset-deprecation-tool

    Co-authors: Steven Chuang, Qinyu Yue, Aravind Rao, and Srihari Duddukuru Introduction Having recently transitioned LinkedIn’s...