LinkedIn Integrates Protocol Buffers With Rest.li for Improved Microservices Performance
April 11, 2023
Each day, LinkedIn serves billions of member requests across all our platforms, including our web and mobile apps. It’s important that these member requests—such as viewing a company page, reading a LinkedIn article, or viewing network connections—are fulfilled quickly and that members aren’t faced with slow page load times (latency). Part of our mission is to make the world’s professionals more productive and successful, and page load times play a significant role. To ensure that our members have an enjoyable experience on the platform and can navigate quickly, we recently integrated a new encoding solution into Rest.li, which resulted in a reduction in latency and an improvement in resource utilization.
Rest.li is an open source REST framework built by and heavily used at LinkedIn for microservice development in order to cater to member requests. Microservices arranges our platforms as a collection of loosely coupled services, communicating through lightweight protocols. Rest.li enables us to build robust, scalable RESTful architectures using type-safe bindings, uniform interface design, and consistent data modeling principles. Across various tiers of our tech stack, there are 50,000+ Rest.li API endpoints used in production at LinkedIn.
Since its introduction, Rest.li has used JSON as its default serialization format. While JSON has served us well in terms of broad programming language support and human-readability (which eases debuggability), its runtime performance has been far from ideal. Though we made several attempts to improve the performance of JSON serialization in Rest.li, it continued to be a bottleneck in many performance profiles.
In this blog post, we’ll discuss some of the challenges we faced with JSON and the process we used to evaluate new solutions and ultimately move forward with Google Protocol Buffers (Protobuf) as a replacement. We’ll also share details of our integration and rollout of Protobuf to Rest.li and the benefits we have seen, including up to 60% reduction in latency and 8% improvement in resource utilization.
Addressing challenges with JSON
JSON offers a broad support of programming languages and is human-readable. However, at LinkedIn, we were faced with a few challenges that resulted in performance bottlenecks. The first challenge is that JSON is a textual format, which tends to be verbose. This results in increased network bandwidth usage and higher latencies, which is less than ideal. While the size can be optimized using standard compression algorithms like gzip, compression and decompression consumes additional hardware resources, and may be disproportionately expensive or unavailable in some environments. The second challenge we faced was that, due to the textual nature of JSON, serialization and deserialization latency and throughput was suboptimal. At LinkedIn, we use garbage collected languages, like Java and Python, so improving latency and throughput is crucial to ensure efficiency.
After a thorough evaluation of several formats like Protobuf, Flatbuffers, Cap’n’Proto, SMILE, MessagePack, CBOR, and Kryo, we determined Protobuf was the best option because it performed the most effectively across the above criteria.
Integrating Protobuf into Rest.li
Rest.li’s serialization abstraction is completely schema-agnostic and only deals with DataMaps and DataLists. Rest.li modeling, using PDL, has no provision to specify field numbers for fields. However, Protobuf serialization relies on both strongly-typed data as well as explicit field numbers, typically defined in Proto schemas.
To continue using Rest.li’s serialization abstraction and retain PDL modeling, we came up with the concept of symbol tables, a two-way mapping from PDL field names/enum values to integer ordinals. Since we know the type of objects stored in the DataMap/DataList at runtime, we can use this information in combination with the symbol table to generate the equivalent of a Protobuf schema at runtime, and use the Protobuf library primitives in various programming languages to serialize/deserialize payloads.
The next challenge was figuring out how to generate these symbol tables. To reduce toil for our backend services, we chose an approach of runtime generation and exchange, as illustrated below:
The symbol table is auto-generated at service startup by examining all the schemas used by all of the service’s endpoints in a deterministic order (so that all instances of the same version of a deployed service generate the same symbol table), and exposed via a special symbolTable endpoint hosted in the same service. When any client communicates with any server, it first checks its local cache for the symbol table, and if not found, requests the server for a copy of the symbol table by querying the symbolTable endpoint of that service. The retrieved symbol table is then cached at client side for future requests.
For the communication between our web/mobile apps and their API servers, we chose to not use this dynamic approach to avoid latency on app startup or bursty traffic on new API server version deployments. Instead, a build plugin in the API servers generates the symbol table with every build, but updates it in “append only” mode, where new symbols are added to the end, and old symbols are not removed. The generated symbol table is published by the build system as a versioned artifact. When the client app is built against this versioned artifact, it uses the symbols when serializing requests to or deserializing responses from the API server.
The rollout at LinkedIn
We first integrated Protobuf support into the open source Rest.li library using the mechanism described above. Once that was done, we bumped all services at LinkedIn to the new version of Rest.li and deployed them to production. To ramp gradually, we used client configurations. These configurations enabled the client to encode the payload using Protobuf instead of JSON and set the content-type header to Protobuf, or set the accept header to instruct the server to prefer Protobuf over JSON for encoding responses. By using this controlled mechanism, we were able to fix bugs, validate improvements, and gradually ramp to all of LinkedIn with minimal disruption to developers or the business.
Using Protobuf resulted in an average throughput per-host increase of 6.25% for response payloads, and 1.77% for request payloads across all services. For services with large payloads, we saw up to 60% improvement in latency. We didn’t notice any statistically significant degradations when compared to JSON in any service. Below is the P99 latency comparison chart from benchmarking Protobuf against JSON when servers are under heavy load.
The usage of Protobuf instead of JSON has resulted in real efficiency wins, translating into tangible savings at LinkedIn’s engineering scale. The rollout of this change using techniques like symbol tables and configuration based ramps made this almost completely transparent to application developers, and resulted in these wins being achieved with minimal engineering and business disruption.
Building on our learnings around rolling out a change of this scale smoothly across LinkedIn, we are planing to work on advanced automation to enable a seamless, LinkedIn-wide migration from Rest.li to gRPC. gRPC will offer better performance, support for more programming languages, streaming, and a robust open source community.
A special thanks to the Rest.Li team Junchuan Wang, Viktor Gavrielov, and Mahnaum Khan for working on this project. Thanks to the management team for supporting this project: Heather McKelvey, Goksel Genc, Maxime Lamure, and Qunzeng Liu.