Spark Articles

  • schema-management-workflow

    Advanced schema management for Spark applications at scale

    March 25, 2020

    Co-authors: Walaa Eldin Moustafa, Wenye Zhang, Adwait Tumbde, Ratandeep Ratti Introduction Over the years, the popularity of Apache Spark at LinkedIn has grown, and users today continue to leverage its unique features for business-critical tasks. Apache Spark allows users to consume datasets using powerful, yet easy-to-use APIs such as the Dataset interface. The...

  • avro1

    Avro2TF: An Open Source Feature Transformation Engine for TensorFlow

    April 4, 2019

    Co-authors: Xuhong Zhang, Chenya Zhang, and Yiming Ma   Today, we are announcing a new open source project called Avro2TF. This project provides a scalable Spark-based mechanism to efficiently convert data into a format that can be readily consumed by TensorFlow. With this technology, developers can improve productivity by focusing on building models rather than...

  • sparksummit2

    Spark Summit 2017: Research, Open Source, and Community

    June 2, 2017

    Next Tuesday marks the start of the Spark Summit Conference in San Francisco. This year, LinkedIn engineers and data scientists are presenting four separate talks at the conference, and we’ll be hosting a meetup at our San Francisco office on the final day. All of this is an indication of the significant impact that Apache Spark has had on the way people process...

  • drelephant2

    A Checkup with Dr. Elephant: One Year Later

    March 6, 2017

    This post has been updated to note the release of Pepperdata's Application Profiler, a commercial project based on Dr. Elephant. Last...

  • Open Sourcing Photon ML

    June 7, 2016

    Machine learning is a key component of LinkedIn’s relevance-driven products. We use machine learning to train the ranking algorithms...

  • Open Sourcing Dr. Elephant

    April 8, 2016

    We are proud to announce today that we are open sourcing Dr. Elephant, a powerful tool that helps users of Hadoop and Spark understand...