Migrating to Espresso
August 2, 2017
Espresso is LinkedIn's strategic distributed, fault-tolerant NoSQL database that powers many LinkedIn services. Espresso has a large production footprint at LinkedIn, with close to a hundred clusters in use, storing about 420 terabytes of Source of Truth (SoT) data and handling more than two million queries per second at peak load.
This post discusses our strategy for migrating one of our internal services (Babylonia) from using Oracle to using Espresso. Our core requirement was to keep Babylonia running uninterrupted throughout the migration process. This post focuses on the steps we took to keep the service running through the transition without affecting our clients. These same concerns are common to many database migrations, not only at LinkedIn.
What is Babylonia?
Babylonia is an internal service at LinkedIn that scrapes external web pages that are shared by our members and serves as a repository for that metadata. The most visible service that Babylonia performs happens when a member pastes a URL into their update box. Babylonia returns information that corresponds to that URL, such as the title, publisher, and images (see image below).
Metadata scraped from an article is used to create an informative panel to accompany a shared link.
When Babylonia receives a request for a URL that it is seeing for the first time, it downloads a copy of the web page. The metadata is then extracted from the web page and stored in a database. If that URL is requested again, for instance when another member shares the same link, Babylonia returns the data that it already has stored in the database.
One question you might ask is, if the system is working right now, why change it? As the system grew, we were both storing more data in the database and making more queries from it. Our Oracle DBAs were telling us that the periodic jobs that they needed to run on our database tables were getting unwieldy. They reduced the maintenance frequency to twice a year, which is the minimum they could get away with, but we were quickly approaching the point where we were going to need to either expand our Oracle resources or migrate to something else.
Meanwhile, LinkedIn was developing Espresso as a new strategic database platform, which could offer us greater scalability without the need for LinkedIn to purchase more licenses or hire more Oracle DBAs. So choosing to migrate from Oracle to Espresso would solve our immediate scaling problems while also reducing costs for the company and building up our own internal platform.
Evaluating the pre-migration state
Babylonia reads and writes to Oracle, and exposes a Rest.li API to other services. Every write to Oracle generates an event on Databus, which is a service that distributes the stream of database changes to listeners that can process them in near-real time. There is an Extract, Transform, Load (ETL) process that makes snapshots of the database available for offline processing.
There were also some legacy systems that had been written before Babylonia had a Rest.li interface that were accessing Oracle directly through SQL queries instead of using Babylonia’s API (see figure below).
The Babylonia service and Oracle database before starting migration
Before starting the migration, we needed to assess all the aspects of the system that interacted with the database. We identified all the code that was tightly-coupled to the database. Some of this code would have to be reimplemented for Espresso. In other cases, the better solution was to eliminate the code or decouple it from Oracle so that it would not need to change during the migration.
We found that we could save ourselves a lot of effort during the migration by first cleaning up what we had. Babylonia had some deprecated Rest.li endpoints whose functionality overlapped with the preferred ones. We modified all the existing clients that accessed Oracle directly or called one of the deprecated endpoints to use only the preferred endpoints.
Pre-migration cleanup eliminated code in other services that called deprecated Babylonia endpoints or Oracle directly.
Our goal at this point was to reduce the overall number of code paths that would need to be reimplemented during the migration. By paying down this technical debt, we were exploiting the self-evident property of code that can be eliminated: the easiest lines of code to migrate and are the lines of code that don’t exist.
Setting up Espresso
Setting up the new schema
Oracle is a relational database, and Espresso is not. Espresso records are stored in Avro format, whose schema definition language is very similar to the Pegasus schema definition language used to define Rest.li interfaces. We chose, therefore, to pattern the Espresso schema after Babylonia’s Rest.li interface schema as closely as possible. This made the conversion from the Oracle tables more complicated, but it made it similar to the translation we were already doing to implement the Rest.li endpoints.
Migrating the data
Once we created an empty Espresso database with our new schema, our next goal was to have the new Espresso database loaded with all the data in Oracle and for new changes in Oracle to be mirrored to Espresso in real time. This would give us a platform where we could make simultaneous queries to both databases and expect the results to match. We split the job into two parts.
For the initial load of data into Espresso, we used a recent snapshot of the Oracle database, translated all of the records from Oracle format to Espresso format, and bulk-loaded all of the records into the Espresso database. This loaded Espresso with a recent copy of the data in Oracle, but it missed all the updates that had occurred after the time of the snapshot.
To handle recent and live changes in the Oracle database, we implemented a Databus listener. Every database change in Oracle generates a Databus message. The Databus listener consumes those message and translates them into Espresso writes.
The Databus listener keeps the Espresso database in sync with Oracle, while Babylonia performs shadow reads to validate that the data in Espresso matches Oracle.
After completing the bulk load, we rolled back the Databus listener to consume events starting from the timestamp of the end of the ETL snapshot. Once it was caught up, the Databus listener kept replicating Oracle updates to Espresso, typically within a few seconds.
To verify that the data we had in Espresso actually matched what was in Oracle, we added shadow read validation—whenever Babylonia processed a read request, it made the same request also from the Espresso data store, and compared the results. Any discrepancy was flagged for further investigation to track down any remaining problems.
Once we had confidence that Espresso was accurately mirroring the data in Oracle, we enabled Babylonia to directly write to Espresso while still writing to Oracle. The shadow read validation continued to ensure that the data directly written to Espresso matched the data going through Oracle.
Babylonia makes direct writes to Espresso.
We’ve had three different processes writing data to our Espresso database: the bulk loader, the Databus listener, and Babylonia itself. One issue we needed to tackle was how we would allow these three writers to operate without conflicting.
Consider the system at this stage, where Babylonia was performing dual writes. After writing directly to Espresso, Babylonia would write to Oracle, which generated a Databus event. When this event reached the Databus listener, it would attempt to again write the same record to Espresso. If we allowed the Databus listener to overwrite the data from Babylonia, it could conceal any issues with the direct writes.
Complicating this further (not shown in the diagrams) is that in each colocation data center (colo) we have multiple instances of Babylonia running and multiple instances of the Databus listener running. Oracle and Espresso each have their own mechanisms for cross-colo replication. Once data is committed in one colo, those changes start propagating around the world. There’s a chance that, somewhere, the replicated Oracle data may reach a local Databus listener before the Espresso replication has updated the same data.
We have a similar problem with our LinkedIn Experimentation (LiX) Platform for controlling the ramp. When we change the state of a LiX, there is no way to ensure that all instances of Babylonia and the Databus listener see the new state simultaneously.
Essentially, the problem is that any scenario relying on timing or LiX states to ensure that only one process updates the record in Espresso will have some chance of dropped or duplicate writes, which could lead to inconsistencies between Oracle and Espresso or between Espresso databases in different colos.
Our solution to this problem was to add an additional optional field to the Espresso schema, which we called MigrationControl. When a process writes to Espresso, it sets the MigrationControl to indicate which type of process it is: bulk loader, Databus listener, or Babylonia.
In the write methods, we added logic that checks for an existing record in Espresso. If there is one, it examines the MigrationControl field. If the Databus listener finds that the record has been recently written by Babylonia, it aborts writing the record. That way the last write always comes from Babylonia.
If we find ourselves in a situation where we need to patch up corrupted data, we can redefine this logic to allow the Databus listener or bulk uploader to overwrite Babylonia.
We are currently at this step in the migration process. Direct writes from Babylonia to Espresso are partially ramped, and we expect to complete that soon and begin the next step, which is establishing the Espresso database as the new SoT.
Declaring Espresso the new SoT
Once we have Babylonia writing to Espresso directly, and we have validated what we are writing to Espresso through shadow read validation, we will be ready to declare that Espresso is our new SoT. Babylonia will continue to write to both Oracle and Espresso, but then it will service read requests by reading only from Espresso.
Even though Babylonia will no longer be dependent on Oracle at this stage, we can’t shut off the writes to Oracle until all the other systems at LinkedIn that use the Oracle Databus and ETL snapshots have migrated to the Espresso-equivalent Brooklin and ETL data sources.
Espresso is the new Source of Truth (SoT). Babylonia continues to write to Oracle until all the systems that depend on Oracle through Databus and ETL have migrated to Espresso.
Once all consumers of Oracle data have migrated, we can turn off dual writes and shut down the Oracle database. Once this is done, we can remove Oracle-related code from Babylonia and remove the MigrationControl logic.
Final state post-migration, where Oracle turnoff is complete.
One challenging part of this migration was needing to plot a course from the starting point to the end state that doesn’t involve shutting down the system in production. Key to achieving this goal was the Databus listener and the MigrationControl field, which will end up as throwaway code, but, like scaffolding in building construction, are essential tools to facilitate the migration.
We designed the new Espresso tables with new features and improvements to add in mind that would have been much harder to implement on the old Oracle tables. However, we did not attempt to implement any of these new features as long as Oracle was still the SoT. We wanted the validation checks between Espresso and Oracle to be as simple and straightforward as possible.
The work related to this migration has been performed by the members, both past and present, of the Content Ingestion team at LinkedIn. Special thanks also to the Oracle, Databus, Espresso, and Nuage teams, whose support has been essential to this project’s success.