From Lambda to Lambda-less: Lessons learned
December 1, 2020
The Lambda architecture has become a popular architectural style that promises both speed and accuracy in data processing by using a hybrid approach of both batch processing and stream processing methods. But it also has some drawbacks, such as complexity and additional development/operational overheads. One of our features for Premium members on LinkedIn, Who Viewed Your Profile (WVYP), relied on a Lambda architecture for some time. The backend system supporting this feature had gone through a few architectural iterations in the past years: it started as a Kafka client processing a single Kafka topic, and eventually evolved to a Lambda architecture with more complicated processing logic. However, in an effort to pursue faster product iteration and lower operational overheads, we recently underwent a transition to make it Lambda-less. In this blog post, we’ll share some of the lessons learned in operating this system in the Lambda architecture, the decisions made in transitioning to Lambda-less, and the shifts necessary to undergo this transition.
How the system works
The WVYP system relies on a number of different input sources to provide members with an up-to-date record of who most recently viewed their profile. These details include:
- Capture profile view information and its deduplication
- Compute view sources (e.g., search, profile page, etc.)
- View relevance (e.g., a senior leader viewed your profile)
- View obfuscations based on the viewing member’s privacy settings
The following diagram describes a simplified version of the system using the Lambda architecture.
First, we had a nearline Kafka client that enabled near real-time processing and serving of profile view activities. When a member viewed another member’s profile, an event called ProfileViewEvent was generated and sent to a Kafka topic. The nearline processing job would consume the ProfileViewEvent and call approximately ten other online services to fetch additional information, such as member profile data, job application information, member network distance (first-degree, second-degree connection), etc. Then, the job would write the processed message to another Kafka topic, which in turn would be consumed by Pinot, a distributed OLAP datastore. Pinot appended the processed message to a table called real-time table. Pinot was ideal in that it can use both offline and real-time data.
In parallel to this, there was also a set of offline Hadoop MapReduce jobs performing all of the above operations in a different technology stack, using the ETL’ed ProfileViewEvent and the corresponding datasets of those online downstream services. The jobs loaded those datasets daily and performed data transformation operations, such as filtering, grouping, and joining. In addition, as depicted in the diagram above, the offline jobs would also process the NavigationEvent that would not be processed by the nearline job, which showed how the profile viewer found the profile owner. The processed dataset was then pushed to Pinot’s offline table.
The Pinot database handled data serving from the real-time and offline tables transparently. A midtier service queried Pinot for the processed profile view information, and sliced and diced the data based on query parameters (such as time range, view occupations, etc.) from the frontend API.
This implemented the Lambda architecture:
- The nearline job was the speed layer, providing fast computation of incomplete information
- The Hadoop offline jobs were the batch layer, aimed at accuracy and throughput
- The Pinot datastore was the serving layer, combining the views from the batch layer and speed layer
Relying on Lambda architecture brought a number of wins, thanks to the speed processing by the speed layer, and accuracy and re-processability by the batch layer. However, it also came with a lot of overheads. As we continued to iterate the product and add more complexity, it was time for a change.
It is well known that Lambda architectures come with overhead expenses that draw criticism. The overheads violate the “Don’t repeat yourself” (DRY) principle. More specifically, the WVYP system faced the following challenges:
- Developers had to build, deploy, and maintain two pipelines that essentially produce the same data for the most part
- The two processing flows needed to be in sync in terms of business logic
Both of the above challenges incurred significant costs in terms of developer time.
Systems evolve for a number of different reasons, including feature enhancements, bug fixes, changes for compliances or security, data migrations, etc. All of these changes for WYVP came with doubled efforts in part because of the Lambda architecture. What’s worse, the Lambda architecture also created additional liability in that new bugs could occur in either the batch or speed layer since we were essentially implementing most features in two different technology stacks. Additionally, as LinkedIn evolved its tools and stack, we were constantly tasked with keeping up with the work required in keeping more flows up to date. For example, we found unnecessary complications in a recent GEO location data migration due to the interplay of various moving parts and ramp schedules. The division of layers in the Lambda architecture created operational challenges all around. For instance, the nearline job would have lags in message processing, and the offline jobs would fail from time to time—both instances that we became much too familiar with. Ultimately, the upkeep was not worth the value in that it significantly slowed down development velocity. With these considerations in mind, we embarked upon an effort to move WVYP away from the Lambda architecture.
The Lambda-less architecture
We set off to simplify the architecture by removing the entire set of offline batch jobs in the old architecture and developing a new nearline message processor using Samza. Recognizing that streaming and offline batch processing each has its own merits applicable to different requirements, we focus in this blog on how we proceeded with the former for our needs at the time. The primary reason we chose to keep nearline processing and remove offline processing is the product requirements of near real-time profile view notifications. There are many other cases in which batch processing is more suitable or essential, such as computing business metrics impacts in A/B testing.
The new architecture is shown in the diagram below.
There are two major changes in this architecture:
- A new Samza job is created to consume both ProfileViewEvent and NavigationEvent, instead of the old consumer which only consumed the former.
- All existing offline jobs are removed and, in its place, we created a singular job that will be discussed later.
The Samza job
Samza, now an Apache project, was originally developed by LinkedIn and is LinkedIn’s de facto distributed stream processing service. We chose to migrate from the existing nearline processor job to a Samza job for a number of reasons.
First of all, Samza supports a variety of programming models, including the Beam programming model. Samza implements the Beam API: this allows us to easily create a pipeline of data processing units including filtering, transformations, joins, and more. For example, in our case, we can easily join the PageViewEvent and NavigationEvent to compute the source of views in near-real-time—this could not as easily have been done in the old processor. Secondly, deploying and maintaining Samza jobs at LinkedIn is straightforward once it’s set up because they're run on a YARN cluster maintained by the Samza team. The dev team does still need to manage the scaling, performance, etc., but it does help immensely on the regular maintenance side (e.g., not need to worry about machine failures). Finally, Samza is well-supported and well-integrated with other LinkedIn tooling and environments.
The new offline job
Some may wonder why we still incorporated an offline job in the Lambda-less architecture. The truth is it isn’t necessary from the architecture transition perspective. However, as depicted in the diagram above, the offline job reads from the ETLed data in HDFS indirectly produced by the Samza job via a Kafka topic. The only purpose of the offline job is to copy all the data that was written to the real-time table in Pinot to the offline table. This is done for two reasons: 1) the offline table has much better performance due to how data is organized (in short offline tables with much fewer data segments than the real-time table enabling faster query); and 2) we store the processed view data for up to 90 days with an automatic data purge, whereas the real-time table only retains the data for a few days. A key difference between the new offline job and its previous iterations in the old architecture is that the job has no overlap with the nearline job in processing logic. None of the logic implemented in the Samza job is implemented here. We can remove this job when Pinot is able to automatically support the consolidation of files from the real-time table to the offline table.
There is no bug-free software in its entire lifecycle; we recognize that things can still go wrong in different ways. In the case of WVYP, an event processed using the wrong logic will remain in the database until it’s reprocessed and fixed. Moreover, unexpected issues can happen outside of the system’s control (e.g., in a way that can impact your data sources). A big promise of batch processing is re-processability. If a job fails, it can reliably re-run and produce the same data. If source data is corrupted, it can re-run to reprocess it.
This becomes a lot more challenging in the streaming case, particularly when the processing relies on other stateful online services to provide additional data. Message processing becomes non-idempotent. WVYP not only relies on online services for states, but also sends notifications to members when messages are processed (important in that we do not want to be sending out duplicate notifications). If the chosen datastore does not support random updates by design, such as Pinot, we needed a dedupe mechanism in place.
We recognize that there is no silver bullet to this problem. Instead, we decided to treat each problem differently and use different strategies to mitigate issues:
- If we need to make minor changes to the processed messages, the best approach will be to write a one-off offline job script to read the processed messages in HDFS (just as we do for the offline job in the new architecture), correct whatever is needed, and push to Pinot to override the previous data files.
- If there is a major processing error or if the Samza job failed to process a large number of events, we can rewind the current processing offset to a previous point in Kafka.
- If the job only degrades its behavior for certain periods of time, such as if the computation of view relevance fails, we would skip the relevance of some views. In this scenario, the system would function in reduced capacity for that period of time.
Duplicate processing happens in various scenarios. One, mentioned above, is an instance in which we explicitly want to re-process the data. Another is inherent to Samza, which guarantees at-least-once processing. When Samza containers are restarted, it will likely process some messages again because the checkpoint it reads will likely not reflect the last message it processed. We are able to address this deduplication at two places:
- The serving layer: When the midtier service reads from the Pinot table, it performs deduplication and picks the view with the latest processed time.
- The notification layer: The notification infrastructure ensures that we do not send duplicate notifications to a member in a configurable period of time.
Lambda architecture has been around for many years and has gained its fair share of praise and critiques. In our case of migrating WVYP, we were able to see the following benefits:
- Significant improvements in development velocity by halving most measures of development time. Reduced maintenance overheads by more than half (the nearline flow had fewer maintenance overheads than the batch processing flow).
- Improved member experience. There is now a reduced likelihood of introducing bugs in the development process. We also have better nearline computation (e.g., fast computation of view sources which was not available earlier) allowing us to serve WVYP information to members faster.
By freeing up developer time, we’re now able to iterate much faster on the system and focus efforts elsewhere. By sharing a look into how the WVYP system was initially developed, operated, and rehauled, we hope that some of our takeaways will help individuals facing similar challenges or considerations in using the Lambda architecture make better decisions.
We’d like to thank Carlos Roman, Ke Wu, Josh Abadie, Chris Harris, David Liu, Hai Lu, Subbu Subramaniam, and Zhonggen Tao for their feedback on the design; Hai Lu, Jean-Francois Desjeans Gauthier, and Chris Ng for their review of this blog post; Priyam Awasthi for her contributions to the implementation; Hristo Danchev, Ray Manpreet Singh Matharu, and Manasa Gaduputi from the Samza team for support and helping troubleshooting issues; and Sriram Panyam and Bef Ayenew for their support on this project.