Hadoop Articles

  • tony1.jpg

    Open Sourcing TonY: Native Support of TensorFlow on Hadoop

    September 12, 2018

    Co-authors: Jonathan Hung, Keqiu Hu, and Anthony Hsu LinkedIn heavily relies on artificial intelligence to deliver content and create economic opportunities for its 575+ million members. Following recent rapid advances of deep learning technologies, our AI engineers have started adopting deep neural networks in LinkedIn’s relevance-driven products, including...

  • dynamometer-1

    Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity

    February 8, 2018

    Co-authors: Erik Krogen and Min Shen In March 2015, LinkedIn’s Big Data Platform team experienced a crisis. As the team was preparing to head home for the day, signs of trouble began trickling in: our internal users were reporting that their applications were stalling or timing out. Job queues were backing up, and SLAs would be missed. A bit of investigation...

  • dali-datasets-feature-image

    Dali Views: Functions as a Service for Big Data

    November 9, 2017

    Co-authors: Carl Steinbach and Vasanth Rajamani Big challenges in the big data ecosystem At LinkedIn, we have a number of challenges managing data in our complex data ecosystem. Changes to our infrastructure are often necessary to make progress, but they are difficult to accomplish without an expensive, large-scale, coordinated effort. Analytics processing...

  • sparksummit2

    Spark Summit 2017: Research, Open Source, and Community

    June 2, 2017

    Next Tuesday marks the start of the Spark Summit Conference in San Francisco. This year, LinkedIn engineers and data scientists are...

  • drelephant2

    A Checkup with Dr. Elephant: One Year Later

    March 6, 2017

    This post has been updated to note the release of Pepperdata's Application Profiler, a commercial project based on Dr. Elephant. Last...

  • Announcing Gobblin 0.7.0: Going Beyond Ingestion

    June 29, 2016

    About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced...