Co-authors: Venkata Krishnan Sowrirajan and Min Shen We are excited to announce that push-based shuffle (codenamed Project Magnet) is now available in Apache Spark as part of the 3.2 release. Since the SPIP vote on Project Magnet passed in September 2020, there has been a lot of interest in getting it into Apache Spark. As of March 2021, 100% of LinkedIn’s Spark...
Spark Articles
-
- Topics:
- Spark,
- Open Source
-
Co-authors: Andy Li and Hongbin Wu Indexing plays the key role in modern search engines for fast and accurate information retrieval, and the ability to swiftly build indexes is crucial for LinkedIn to provide up to date information, such as candidates to recruiters, job posts to members, etc. In some instances, such as if a member profile is missing and...
- Topics:
- Spark,
- Data,
- Distributed Systems
-
Co-authors: Preetam Nandy, Yunsong Meng, Cyrus DiCiccio, Heloise Logan, Amir Sepehri, Divya Venugopalan, Kinjal Basu, and Noureddine El Karoui Introduction LinkedIn’s vision to create economic opportunity for every member of the global workforce would be impossible to realize without leveraging AI at scale. We use AI in our core product offerings to: highlight...
-
Co-authors: Min Shen, Chandni Singh, Ye Zhou, and Sunitha Beeram At LinkedIn, we rely heavily on offline data analytics for...
- Topics:
- Spark,
- infrastructure,
- Data,
- Open Source
-
Co-authors: Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu At LinkedIn, our imperative is to create economic opportunity for every...
- Topics:
- Spark,
- artificial intelligence,
- machine learning,
- Data,
- Open Source
-
Co-authors: Jun Shi, Mingzhou Zhou Introduction In the machine learning community, Apache Spark is widely used for data processing due...
- Topics:
- Spark,
- machine learning,
- TensorFlow,
- Data,
- Open Source