  • Reducing Apache Spark Application Dependencies Upload by 99%

    March 9, 2023

    Co-authors: Shu Wang, Biao He, and Minchu Yang At LinkedIn, Apache Spark is our primary compute engine for offline data analytics such as data warehousing, data science, machine learning, A/B testing, and metrics reporting. We execute nearly 100,000 Spark applications daily in our Apache Hadoop YARN (more on how we scaled YARN clusters here). These applications...