As part of LinkedIn’s data infra, the Cluster Management (Hercules) team is responsible for developing and maintaining highly available and scalable cluster management solutions for our stateful data systems and stateless services in LinkedIn.
The team has built and maintained a diverse set of systems including:
- Apache Helix, a generic cluster management framework to manage data placement and partition state for distributed data systems.
- Helix Task Framework, a high-throughput distributed task scheduler and workflow manager for both stateful and stateless jobs.
- Scalable Metadata Service built on top of Apache Zookeeper for metadata and config management, failure detection and service discovery, etc.
- Platform Automation Orchestration for data system health monitoring, platform automation, and service coordination.
The systems built by the team have powered many of LinkedIn’s core distributed data systems and production services, including Espresso, LinkedIn's horizontally-scalable document store for primary data; Venice, our derived data serving platform for merging batch and streaming data sources; Ambry, LinkedIn’s distributed object store; Vector, LinkedIn's media serving platform; Gobblin, the unified data ingestion framework; Pinot, the real-time analytics platform; and D2, our core Dynamic Discovery service, etc.