Data Articles

  • diagram-showing-cycle-of-bias-reinforcement-over-time

    Using the LinkedIn Fairness Toolkit in large-scale AI systems

    February 8, 2021

    Co-authors: Preetam Nandy, Yunsong Meng, Cyrus DiCiccio, Heloise Logan, Amir Sepehri, Divya Venugopalan, Kinjal Basu, and Noureddine El Karoui Introduction LinkedIn’s vision to create economic opportunity for every member of the global workforce would be impossible to realize without leveraging AI at scale. We use AI in our core product offerings to: highlight...

  • illustration-of-budget-divided-among-groups

    Budget-split testing: A trustworthy and powerful approach to marketplace A/B testing

    January 21, 2021

    Co-authors: Min Liu, Vangelis Dimopoulos, Elise Georis, Jialiang Mao, Di Luo, and Kang Kang The LinkedIn ecosystem drives member and customer value through a series of marketplaces (e.g., the ads marketplace, the talent marketplace, etc.). We maximize that value by making data-informed product decisions via A/B testing. Traditional A/B tests on our marketplaces,...

  • FastIngest: Low-latency Gobblin with Apache Iceberg and ORC format

    January 6, 2021

    Co-authors: Zihan Li, Sudarshan Vasudevan, Lei Sun, and Shirshanka Das Data analytics and AI power many business-critical use cases at LinkedIn. We need to ingest data in a timely and reliable way from a variety of sources, including Kafka, Oracle, and Espresso, bringing it into our Hadoop data lake for subsequent processing by AI and data science pipelines. We...

  • coral-a-sql-translation-analysis-and-rewrite-engine

    Coral: A SQL translation, analysis, and rewrite engine for ...

    December 10, 2020

    Co-authors: Walaa Eldin Moustafa, Wenye Zhang, Sushant Raikar, Raymond Lam, Ron Hu, Shardul Mahadik, Laura Chen, Khai Tran, Chris Chen...

  • explaining-metadata-architectures

    DataHub: Popular metadata architectures explained

    December 7, 2020

    When I started my journey at LinkedIn ten years ago, the company was just beginning to experience extreme growth in the volume,...

  • pegasus-data-language

    Pegasus Data Language: Evolving schema definitions for data...

    November 19, 2020

    Pegasus Data Schema (PDSC) is a Pegasus schema definition language that has been used for data modeling with Rest.li services for...