Spark Articles

  • diagram-showing-cycle-of-bias-reinforcement-over-time

    Using the LinkedIn Fairness Toolkit in large-scale AI systems

    February 8, 2021

    Co-authors: Preetam Nandy, Yunsong Meng, Cyrus DiCiccio, Heloise Logan, Amir Sepehri, Divya Venugopalan, Kinjal Basu, and Noureddine El Karoui Introduction LinkedIn’s vision to create economic opportunity for every member of the global workforce would be impossible to realize without leveraging AI at scale. We use AI in our core product offerings to: highlight...

  • architecture-diagram-of-magnet

    Magnet: A scalable and performant shuffle architecture for Apache Spark

    October 21, 2020

    Co-authors: Min Shen, Chandni Singh, Ye Zhou, and Sunitha Beeram At LinkedIn, we rely heavily on offline data analytics for data-driven decision making. Over the years, Apache Spark has become the primary compute engine at LinkedIn to satisfy such data needs. With its unique features, Spark empowers many business-critical tasks at LinkedIn, including data...

  • Addressing-bias-in-large-scale-AI-applications-with-lift

    Addressing bias in large-scale AI applications: The LinkedIn Fairness Toolkit

    August 25, 2020

    Co-authors: Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu At LinkedIn, our imperative is to create economic opportunity for every member of the global workforce, something that would be impossible to accomplish without leveraging AI at scale. We help members and customers make decisions by providing them with the most relevant insights based on the available...

  • buliding-blocks-of-spark-tf-record

    Spark-TFRecord: Toward full support of TFRecord in Spark

    May 4, 2020

    Co-authors: Jun Shi, Mingzhou Zhou Introduction In the machine learning community, Apache Spark is widely used for data processing due...

  • schema-management-workflow

    Advanced schema management for Spark applications at scale

    March 25, 2020

    Co-authors: Walaa Eldin Moustafa, Wenye Zhang, Adwait Tumbde, Ratandeep Ratti Introduction Over the years, the popularity of Apache...

  • avro1

    Avro2TF: An open source feature transformation engine for...

    April 4, 2019

    Co-authors: Xuhong Zhang, Chenya Zhang, and Yiming Ma Today, we are announcing a new open source project called Avro2TF. This project...