Next Tuesday marks the start of the Spark Summit Conference in San Francisco. This year, LinkedIn engineers and data scientists are presenting four separate talks at the conference, and we’ll be hosting a meetup at our San Francisco office on the final day. All of this is an indication of the significant impact that Apache Spark has had on the way people process...
Hadoop Articles
-
- Topics:
- Spark,
- Hadoop,
- research,
- Dr. Elephant,
- events
-
This post has been updated to note the release of Pepperdata's Application Profiler, a commercial project based on Dr. Elephant. Last April, we announced the first open source release of Dr. Elephant, a performance monitoring and tuning service for Hadoop and Spark jobs. That announcement marked the culmination of two years of internal development work and more...
- Topics:
- Spark,
- Hadoop,
- Dr. Elephant,
- Open Source
-
About a year ago, we open sourced Gobblin, a universal data ingestion framework that aimed to solve data integration challenges faced by people working on big data problems. We have described how LinkedIn is using Gobblin to ingest data at massive scale from a variety of sources to HDFS, in many previous blog posts, publications, and talks. Today, we are very...
- Topics:
- Hadoop,
- Big Data,
- Open Source,
- Data Ingestion,
- Distributed Systems,
- ETL,
- Gobblin,
- Kafka
-
We shared Gobblin with the open source community a year ago. Since then, we’ve seen increasing interest and adoption among engineers,...
- Topics:
- Hadoop,
- Data Ingestion,
- Gobblin,
- Kafka,
- Open Source
-
We are proud to announce today that we are open sourcing Dr. Elephant, a powerful tool that helps users of Hadoop and Spark understand...
- Topics:
- Spark,
- Hadoop,
- Dr. Elephant,
- Open Source
-
Co-authors: Vamshi Hardageri, Brian Jue Sizr is an interactive visualization tool developed at LinkedIn for the Hadoop Distributed...