For a large scale site such as LinkedIn, tracking metrics accurately and efficiently is an important task. For example, imagine we need a dashboard that shows the number of visitors to every page on the site over the last thirty days. To keep this dashboard up to date, we can schedule a query that runs daily and gathers the stats for the last 30 days. However,...
![Matthew Hayes](https://content.linkedin.com/content/dam/engineering/en-us/blog/migrated/me_8.jpg)
Posts by Matthew Hayes
-
As the use of Hadoop grows in an organization, scheduling, capacity planning, and billing become critical concerns. These are all open problems in the Hadoop space, and today, we’re happy to announce we’re open sourcing LinkedIn’s solution: White Elephant. At LinkedIn, we use Hadoop for product development (e.g., predictive analytics applications like People You...
- Topics:
- White Elephant,
- Hadoop,
- Open Source
-
Introducing DataFu: an open source collection of useful Apache Pig UDFs
Matthew Hayes January 10, 2012
At LinkedIn, we make extensive use of Apache Pig for performing data analysis on Hadoop. Pig is a simple, high-level programming language that consists of just a few dozen operators and makes it easy to write MapReduce jobs. For more advanced tasks, Pig also supports User Defined Functions (UDFs), which let you integrate custom code in Java, Python, and...
- Topics:
- Apache Pig,
- Hadoop,
- Open Source