What is the shape of your big data? While we do love to talk about the size of our big data—terabytes, petabytes, and beyond—perhaps we are not paying due recognition to the shape of it. Big data comes in a variety of shapes. The Extract-Transform-Load (ETL) workflows are more or less stripe-shaped (left panel in the figure above) and produce an output of a...

Posts by Maneesh Varshney
-
- Topics:
- Big Data,
- machine learning,
- data science,
- Analytics
-
Open Sourcing Cubert: A High Performance Computation Engine for Complex Big Data Analytics
Maneesh Varshney November 11, 2014
Authors: Maneesh Varshney, Srinivas Vemuri What do you do when your Hadoop ETL script is mercilessly killed because it is hogging too many resources on the cluster, or if it starts missing completion deadlines by hours? We encountered this exact same problem more than a year ago while building the computation pipeline for XLNT, LinkedIn’s A/B testing platform....
- Topics:
- Big Data,
- Hadoop,
- Cubert,
- Open Source