Co-authors: Walaa Eldin Moustafa, Wenye Zhang, Sushant Raikar, Raymond Lam, Ron Hu, Shardul Mahadik, Laura Chen, Khai Tran, Chris Chen, and Nagarathnam Muthusamy Introduction At LinkedIn, our big data compute infrastructure continually grows over time, not only to keep pace with the growth in the number of data applications, or their domains spanning data...
Apache Pig Articles
-
- Topics:
- scale,
- Apache Pig,
- Data,
- Dali,
- Open Source
-
DataFu is an open-source collection of user-defined functions for working with large-scale data in Hadoop and Pig. About two years ago, we recognized a need for a stable, well-tested library of Pig UDFs that could assist in common data mining and statistics tasks. Over the years, we had developed several routines that were used across LinkedIn and were thrown...
- Topics:
- DataFu,
- Hadoop,
- Apache Pig,
- Open Source
-
At LinkedIn, we make extensive use of Apache Pig for performing data analysis on Hadoop. Pig is a simple, high-level programming language that consists of just a few dozen operators and makes it easy to write MapReduce jobs. For more advanced tasks, Pig also supports User Defined Functions (UDFs), which let you integrate custom code in Java, Python, and...
- Topics:
- Apache Pig,
- Hadoop,
- Open Source
-
Co-authored by Nicholas Swartzendruber This blog post tells the story of how you can use Apache Pig and Hadoop to turn terabytes of...
- Topics:
- Hadoop,
- MapReduce,
- Apache Pig,
- User Engagement