Spark Articles

  • Addressing-bias-in-large-scale-AI-applications-with-lift

    Addressing bias in large-scale AI applications: The LinkedIn Fairness Toolkit

    August 25, 2020

    Co-authors: Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu At LinkedIn, our imperative is to create economic opportunity for every member of the global workforce, something that would be impossible to accomplish without leveraging AI at scale. We help members and customers make decisions by providing them with the most relevant insights based on the available...

  • buliding-blocks-of-spark-tf-record

    Spark-TFRecord: Toward full support of TFRecord in Spark

    May 4, 2020

    Co-authors: Jun Shi, Mingzhou Zhou Introduction In the machine learning community, Apache Spark is widely used for data processing due to its efficiency in SQL-style operations, while TensorFlow is one of the most popular frameworks for model training. Although there are some data formats supported by both tools, TFRecord—the data format native to TensorFlow—is...

  • schema-management-workflow

    Advanced schema management for Spark applications at scale

    March 25, 2020

    Co-authors: Walaa Eldin Moustafa, Wenye Zhang, Adwait Tumbde, Ratandeep Ratti Introduction Over the years, the popularity of Apache Spark at LinkedIn has grown, and users today continue to leverage its unique features for business-critical tasks. Apache Spark allows users to consume datasets using powerful, yet easy-to-use APIs such as the Dataset interface. The...

  • avro1

    Avro2TF: An open source feature transformation engine for...

    April 4, 2019

    Co-authors: Xuhong Zhang, Chenya Zhang, and Yiming Ma Today, we are announcing a new open source project called Avro2TF. This project...

  • sparksummit2

    Spark Summit 2017: Research, Open Source, and Community

    June 2, 2017

    Next Tuesday marks the start of the Spark Summit Conference in San Francisco. This year, LinkedIn engineers and data scientists are...

  • drelephant2

    A Checkup with Dr. Elephant: One Year Later

    March 6, 2017

    This post has been updated to note the release of Pepperdata's Application Profiler, a commercial project based on Dr. Elephant. Last...