Jhubbub on Helix: Stateless and elastic made easy
August 27, 2020
LinkedIn was built to help professionals achieve more in their careers, and every day millions of people use our products to make connections, discover new opportunities and get better at what they do. An important part of our mission is helping people to find other professionals who are interested in the same things they are and to have meaningful, useful conversations with them. To curate and deliver relevant and engaging news and updates for our members, we rely on various pipelines to pull the latest information from diverse and trusted sources.
An internal backend service called Jhubbub helps with this effort by tracking over 100,000 RSS feeds and periodically pulling in URL contents. These RSS feeds are published by external news sites to enable the discovery of newly published articles as well as changes to existing articles. Jhubbub plays a crucial role in making sure LinkedIn keeps up with a rapidly-evolving stream of information where titles and content of news articles can change as stories develop.
In this blog post, we’ll share the latest architectural improvements made to Jhubbub with Apache Helix, a generic cluster management framework.
Jhubbub is the RSS feed processor at LinkedIn. During each scheduled fetch by Jhubbub, the RSS feeds and their respective “last-read” timestamps are stored in an Espresso database table. Every few seconds, Jhubbub performs a query to identify all RSS feeds that are becoming stale (feeds that haven’t been refreshed in the last 5 minutes are considered stale). The feeds are fetched, and then Jhubbub is updated with any changes detected on the feeds.
To distribute the work, we deploy Jhubbub to several dozen hosts distributed across LinkedIn’s data centers in different geographical regions. To avoid assigning duplicate work across hosts, each host gets a dedicated subset of the table’s partitions and only queries records in those partitions.
Manual recovery and reconfiguration
For several years, we relied on machine-specific configuration files to distribute our table’s partitions among Jhubbub hosts. For example, Host A’s configuration file would list partitions 1, 2, 3 as Host A’s responsibility; Host B’s configuration file would list partitions 4, 5, 6; and so on. If we had to turn off a host, a site reliability engineer (SRE) would need to redistribute that host’s share of partitions by manually editing configuration files and redeploying instances.
This process was cumbersome and error prone. If a host failed unexpectedly, it could take hours or days before a new host was started and assigned partitions. In the meantime, thousands of RSS feeds could be ignored, causing stale news to show up on the feed. A human error in editing configs could cause even more severe outages. This led us to look for a solution that could help us:
- Automate the failover and recovery process
- Eliminate the need for human intervention, not leaving any room for human mistake and error
We sought to enhance the fault-tolerance of the system by adding automatic failover to improve reliability.
Scalability and flexibility with stateless and elastic architecture
For Jhubbub, all the mappings of the partitions were hard-coded to hosts with static instance identifiers. This made the application stateful because certain partitions would have to be handled by certain designated hosts according to the mapping manually created by SREs. Adding or removing partitions or hosts would therefore require manual reconfiguration. Because it was difficult to scale Jhubbub up or down, we maintained a constant capacity regardless of the volume.
On the other hand, a stateless application can accommodate the addition and removal of resources and nodes without impacting service availability. The number of hosts allocated to the application can be elastically provisioned up and down at will to meet demand without requiring manual intervention. As such, a stateless and elastic architecture makes it easier to manage a service and reason about its behavior.
As we scaled and grew our footprint as a content platform, we needed to be more agile in keeping our content up to date. One way to accomplish this was to scale Jhubbub to meet its needs. This involved doing away with the current way of manually distributing the RSS feed ingestion work.
Migration to Azure
Last year, we announced our multi-year migration to Azure. One of the goals for this migration was to make all of our internal applications stateless because stateless architectures are more suited for the public cloud infrastructure. When the move was announced, most of LinkedIn’s applications were already stateless, but Jhubbub was not.
Motivated by the pain points described above and our company-wide commitment to move to public cloud, we made Jhubbub application stateless and elastic with the help of Helix, a cluster management framework developed at LinkedIn.
Apache Helix is an open source cluster management framework for distributed systems. Helix automatically assigns and relocates partitions and their replicas in the events of node failure, cluster expansion, or reconfiguration.
Helix separates cluster management from application logic by representing an application’s lifecycle as a finite state machine. The Apache Helix website describes in detail Helix’s architecture and how it can be used to manage distributed systems.
Helix was initially developed at LinkedIn and currently manages data infrastructure services such as Espresso (No-SQL document store), Brooklin (a near real-time streaming system), Venice (derived-data store), and Pinot (offline OLAP store).
Modeling Jhubbub as a Helix application
In order to distribute Jhubbub and manage its distributed workloads, we needed a coordination solution with a central brain and a metadata store to store the metadata about the nodes and workloads. One possible option was to build a controller/coordinator using a distributed file-system like Apache ZooKeeper.
The downside of this approach was the cost: building a brand-new coordination service would require a nontrivial amount of engineering effort and is generally difficult to get right. Another downside to this approach was the difficulty of using ZooKeeper. We considered options like Apache Curator, which is an open source ZooKeeper library that makes ZooKeeper easier to use, but that was not the official way applications are programmed with ZooKeeper at LinkedIn. Finally, deploying and maintaining a complex infrastructure-level coordination service without prior expertise would be a major commitment for the Jhubbub team.
However, given Jhubbub was already partitioned manually, using Helix to make Jhubbub stateless and scalable seemed like a better alternative.
Picking the right state model
Helix uses a declarative state model to describe each resource of the application. For Jhubbub, we wanted to have one resource with multiple partitions. Each partition would match that of the NoSQL table that we wanted to source our RSS feeds from. The goal here for the resource was to have some degree of fault tolerance and failover built in via Helix’s state model.
Upon reviewing the predefined state models that Helix offers, we decided to use the LeaderStandby state model. The LeaderStandby state model allowed each of Jhubbub’s partitions to have one Leader replica and two Standby replicas, all located on different hosts. In the event of a node failure, Helix would automatically recognize any Leader replicas missing and temporarily bring one of the Standby replicas up to the Leader state. This meant that there would be no human intervention necessary for reconfiguration and reassignment, and that all of this could take place within a matter of seconds without affecting our availability.
Finite state machine for the Helix LeaderStandby state model
Integration with Helix
Once we had chosen the right state model, the remaining work was to implement the state transition logic. Although Helix, as a framework, could trigger state transitions between states like Leader and Standby automatically as needed, the responsibility was on the application to provide the actual transition logic. For Jhubbub, a node would only start processing RSS feeds when it is in the Leader state—Standby replicas would merely exist as backups. We translated these requirements into code and registered our implementation with Helix.
Moreover, there were other features of Helix that proved to be useful for Jhubbub.
A very common failure scenario one must consider in building a distributed service is network instability. We were already aware of such problems with Jhubbub nodes. In the event of transient connectivity loss, we did not want our Leader partitions to be shuffled onto other available nodes right away because doing so might cause more harm than good in the case that the blip in connection was temporary and could recover quickly on its own.
Helix allowed us to tackle this problem more intuitively with the delayed rebalance feature. We set the tolerance duration (for example,1 minute), for which Helix would overlook missing top state (Leader) partitions, to avoid unnecessary shuffling in the cluster.
Easy disabling/enabling/swapping of nodes
Another failure scenario in distributed systems is hardware failure. If a node is unhealthy and is unable to recover on its own, then it needs to either be retired from the cluster or swapped with a new instance. Helix provides APIs to easily disable and swap nodes.
Wins & execution
Helix is provided at LinkedIn as a managed service, meaning we did not have to go through the process of setting up our own ZooKeeper and the Helix REST/UI service ecosystem (available as part of Apache Helix). We simply created our cluster using the tools that were already available to us, and added resources and configs easily.
Programming with Helix was a simple process that needed minimal code change—we were able to reuse most of our existing Jhubbub application logic. All in all, we were able to finish the integration in two weeks.
By using the LeaderStandby model with the number of replicas set at 3, we were able to build high availability with redundancy into Jhubbub, which we didn’t have before.
Scalability and elasticity
Helix Controller immediately reacts to changes in the cluster and rebalances so that each instance has partitions evenly assigned to it. This makes 1) addition of partitions in Jhubbub and 2) addition and removal of nodes easy without the explicit need for reconfiguration.
Reliability and observability
Previously, the health check for Jhubbub nodes were done via metrics and alerts, and if there were anything abnormal, human intervention was needed to recover. However, since Helix automatically detects connectivity issues by using ZooKeeper’s ephemeral nodes and rebalances unhealthy partitions, the entire system is more stable and reliable. Bad nodes can be dealt with asynchronously instead of causing an outage of the whole service.
In addition to the metrics already emitted by Jhubbub instances, there was an added benefit of having metrics emitted by the Helix controller for cluster-level aggregation as well as the metrics emitted by each Helix Participant. Having application-level metrics and framework-level metrics alike now gives a fuller picture of what’s happening to the clusters in different datacenters.
Future work: Multi-region Helix clusters
Jhubbub is a multi-region service deployed across multiple datacenters. In the future, we may have virtual machines from multiple geographical regions and availability zones. In the on-prem world, we would set up a ZooKeeper server quorum in each region and create a Helix cluster in whichever regions the application operated in, but as we make our move to Azure, creating a Helix cluster per zone may not scale since it would only increase maintenance overhead.
Helix and ZooKeeper developers at LinkedIn are experimenting with creating a ZooKeeper that can be accessed from multiple regions. We have been operating a multi-region ZooKeeper server quorum where each member of the quorum is located in different regions. However, this came with limited success and we decided to try out another approach: create a server quorum across different availability zones in one Azure region, adding ZooKeeper observers in the other geographical regions, and then having Jhubbub nodes in each region connect to the corresponding region-local observers. This way, we will be able to accommodate VMs connecting from new zones without having to create new clusters and ZooKeeper quorums.
Multi-region Helix cluster with ZooKeeper
We would like to thank Nicholas Lee for his contributions to this initiative. We would also like to thank our leadership, Yaz Shimizu, Brandon Chesla, Lei Xia, and Ivo Dimitrov, and those who provided valuable feedback on our efforts: Andrew Ge, David Max, Li Xie, Henry Escobar, Logan Rosen, Ryan Underwood, Kiran Kanakkassery, Junkai Xue, and Jiajun Wang.