Announcing Helix, an open source cluster management system

October 10, 2012

We are very pleased to announce the open source release of Helix- a generic cluster management system for managing partitioned and replicated resources in distributed systems.

Helix

Helix makes distributed system development simpler by separating cluster management from the primary component tasks of a distributed system.

What is Helix?

The LinkedIn infrastructure stack consists of a number of different distributed systems, each specialized to solve a particular problem. This includes online data storage systems (Voldemort, Espresso), messaging systems (Kafka), change data capture system (Databus), a distributed system that provides search as a service (SeaS), and a distributed graph engine.

Although each system services a different purpose, they share a common set of requirements:

  • Resource management: The resources in the system (such as the database and indexes) must be divided among nodes in the cluster.
  • Fault tolerance: Node failures are unavoidable in any distributed system. However, the system as whole must continue to be available in the presence of such failures, without losing data.
  • Elasticity: As workload grows, clusters must be able to grow to accommodate the increased demand.
  • Monitoring: The cluster must be monitored for node failures as well as other health metrics, such as load imbalance and SLA misses.

Rather than forcing each system to reinvent the wheel, we decided to build Helix, a cluster management framework that solves these common problems. This allows each Distributed System to focus on its distinguishing features, while leaving Helix to take care of cluster management functions.

Helix provides significant leverage beyond just code reuse. At scale, the operational cost of management, monitoring and recovery in these systems far outstrips their single node complexity. A generalized cluster management framework provides a unified way of operating these otherwise diverse systems, leading to operational ease.

Helix at Linkedin

Helix has been under development at LinkedIn since April 2011. Currently, it is used in production in three different systems:

  • Espresso: Espresso is a distributed, timeline consistent, scalable document store that supports local secondary indexing and local transactions. Espresso runs on a number of storage node servers that store and index data and answer queries. Espresso databases are horizontally partitioned across multiple nodes, with each partition having a specified number of replicas. Espresso designates one replica of each partition as master (which accepts writes) and the rest as slaves; only one master may exist for each partition at any time. Helix manages the partition assignment, cluster-wide monitoring, and mastership transitions during planned upgrades and unplanned failure. Upon failure of the master, a slave replica is promoted to be the new master.
  • Databus: Databus is a change data capture (CDC) system that provides a common pipeline for transporting events from LinkedIn primary databases to caches, indexes and other applications such as Search and Graph that need to process the change events. Databus deploys a cluster of relays that pull the change log from multiple databases and let consumers subscribe to the change log stream. Each Databus relay connects to one or more database servers and hosts a certain subset of databases (and partitions) from those database servers, depending on the assignment from Helix.
  • SeaS (Search as a Service): LinkedIn’s Search-as-a-service lets other applications define custom indexes on a chosen dataset and then makes those indexes searchable via a service API. The index service runs on a cluster of machines. The index is broken into partitions and each partition has a configured number of replicas. Each new indexing service gets assigned to a set of servers, and the partition replicas must be evenly distributed across those servers. When indexes are bootstrapped, the search service uses snapshots of the data source to create new index partitions. Helix manages the assignment of index partitions to servers. Helix also limits the number of concurrent bootstraps in the system, as bootstrapping is an expensive process.

Try it out

We invite you to download and try out Helix. In the past year, we have had significant adoption and contributions to Helix by multiple teams at Linkedin. By open sourcing Helix, we intend to grow our contributor base significantly and invite interested developers to participate.

We will also be presenting a paper on Helix at the upcoming SOCC (ACM Symposium on Cloud Computing) at San Jose, CA on Oct 15th, 2012.

Acknowledgments

Helix is currently actively developed and maintained by Kishore Gopalakrishna, Shi Lu and Zhen Zhang. We would like to thank the members of Espresso, Databus and Search teams for their help and support in development of Helix. We would also like to thank our management Bob Schulman, Bhaskar Ghosh, Kevin Scott for their enthusiastic support in the development and open sourcing of Helix.

Topics