Boosting Site Speed Using Brotli Compression
May 2, 2017
In the following sections, we’ll talk about Brotli, compare it with other compression algorithms, explain the implementation details of integrating Brotli with our system, discuss the challenges we met when we began to adopt it, and share the resulting site speed improvements.
What is Brotli?
Brotli is a new, open source compression algorithm developed by two Google engineers and released in 2015. It’s designed for text compression and provides a 20%-30% reduction in size compared to gzip. Its encoding speed is generally slower than gzip (depends on quality setting), while its decoding speed is on par with gzip.
As of today, the following major browsers support Brotli:
Chrome has supported Brotli since version 49.
Microsoft Edge will support Brotli starting from its next version, 15.
Firefox implemented Brotli in version 44.
Opera has supported Brotli since version 36.
Safari has not yet made a public comment on Brotli. No Internet Explorer (IE) version supports Brotli, but the vast majority of Windows users are now coming via Edge, Chrome, or Firefox. For the mobile web, the browser landscape is more fragmented and support is much worse.
Brotli support table
The figure above is the global Brotli support table. The global penetration rate is approximately 56.75%. We also studied the penetration rate for LinkedIn members in three major countries, based on tracking data.
Desktop users are primarily on Chrome and Firefox worldwide, while mobile users are primarily on Safari in the U.S. and China and on Chrome in India. The fact that the majority of members in India use Chrome helps account for the higher penetration rate in that country.
Browsers that support Brotli automatically include br in the Accept-Encoding header sent with the HTTP request. It tells the server that the user agent supports decompressing Brotli responses.
Today, CDN support for Brotli is spotty. LinkedIn uses five different CDN providers. So far, none of these five has publicly advertised Brotli support. If you want to use Brotli with those CDNs, it’s very unlikely that they would work out of the box. They have built-in assumptions of gzip versus non-gzip use cases, and no standardized support for caching based on Vary: Accept-Encoding.
The good news is that, with proper design and configuration, we have successfully made all five CDN providers serve and cache Brotli static content. Details are covered in the Implementation section below.
Brotli vs Zopfli vs gzip
|Algorithm||Quality||Compression Time (ms)||Decompression Time (ms|
Static content server
LinkedIn uses centralized, dedicated server clusters to serve as CDN origin for application static content, freeing the application web servers from that responsibility. Each static content file is uniquely identified by the MD5 hash of its content. A typical LinkedIn static content URL looks like the following:
static.licdn.com is an umbrella domain of all of LinkedIn’s CDN providers. The CDNs will fetch the object from our origin static content server if there’s a cache miss, or return the object directly when there’s a cache hit.
When a web application is deployed to production, the application’s static content is uploaded and stored in our Espresso database. MD5 hashes are calculated and used as keys to the DB to index the files. The raw file and the Brotli compressed file have different hash keys and URLs, as their content is different. The static content server recognizes files with .br extension, then stores the compressed binary and content encoding in the DB. When a GET request comes in, it serves the compressed binary directly and adds Content-Encoding: br to the response header.
For each file, we compress it using Brotli CLI and write to a new file with .br extension. The build pipeline also creates a JSON file that stores the mapping of file name to the MD5 hash of file content.
The hashes are later used by the web server to build the CDN URLs for any particular static content file.
Application web server
The build pipeline produces two different versions for each file: raw and Brotli compressed. We implemented a special content negotiation mechanism in the application web server to decide which one to serve.
Working around CDN limitations
We encountered some problems when making CDNs work with Brotli. Here’s how we solved them.
1. CDNs normalize the incoming Accept-Encoding header. Normally CDNs want to limit the number of cache entries by normalizing the header to gzip/non-gzip cases. Brotli is so new, CDNs haven’t caught up yet. Our static content server never gets Accept-Encoding: br header from CDN-forwarded requests.
We solved this by using separate URLs for Brotli objects, instead of the same URL as raw objects accompanied by Vary: Accept-Encoding. Even if br is absent from Accept-Encoding, we serve it anyway. This effectively makes the CDN think of the object as binary. We trust the application web server to make the right decision: if a Brotli object is chosen over the raw object, we can guarantee that the client can decompress it. Generally speaking, using a separate URL for a Brotli object is a cleaner and safer option than using the same URL with Vary: Accept-Encoding header. CDNs have different opinions on Vary regarding caching, but they always treat different URLs as different cache entries.
2. CDNs handle Brotli object caching differently. We encountered a problem where Brotli objects could not be cached by some of our CDN providers. The culprit was that the static content server sent back Content-Encoding: br in the response, and the CDN couldn’t handle it.
To solve this problem, we set up uniform behavior for Brotli objects across all our CDNs. In order to identify the Brotli response from the origin, other than looking at Content-Encoding, we added a path segment /sc/h/br for serving Brotli objects (see figure below). We set up some rules on CDNs by matching the path /sc/h/br to Brotli objects. If the path matches, we strip Content-Encoding: br on the way in, cache the Brotli object, and later add the header back to the response to the client.
A sample Brotli compressed static content response
Theoretically almost all CDNs can be made to work with Brotli even if they don’t directly support it. The main trick is to treat Brotli objects as raw binary data and set up rules based on the URL path that return the correct Brotli header.
Resulting site speed improvements
We created an A/B test and ramped 50% of LinkedIn members to the “use Brotli whenever possible” treatment. The other 50% stayed in control treatment. We compared the 90th percentile LinkedIn feed initial page load time. We saw 2-3.6% improvement in site speed in the U.S., and 6-6.5% improvement in India. Brotli offers better results for low-bandwidth clients. We saw a bigger improvement in India versus the U.S., and in mobile versus desktop.
The number still has room to grow because the penetration rate is not 100% yet. Not everyone who was ramped to the Brotli experiment was using browsers that support it.
Brotli can only help speed up content download. To better understand how content download contributes to the overall page load time, we used Chrome dev tool to break down the timeline into CPU processing categories.
CPU processing categories for feed page load
By using Brotli for static content, we see impressive site speed wins, especially for low-bandwidth scenarios. We believe Brotli has a lot of potential and encourage every website to invest in enabling Brotli for better user experience. With new browser releases coming up and users upgrading to the latest browsers, the return on investment will be increasingly better over time.
Dynamic Brotli: We are actively experimenting with using Brotli to compress dynamic content (Base HTML document, JSON API responses, etc.). This needs to happen online, since dynamic content is generated on the fly. So, we’ll be using lower compression quality. Our initial experiment shows that for typical LinkedIn dynamic content, Brotli provides 10%-20% size reduction without sacrificing compression and decompression time compared to gzip.
Zopfli or other gzip-compatible algorithms. We need to better serve members that are using browsers that don’t support Brotli. We are looking for a number of ways to optimize gzip use cases.
Using Brotli at LinkedIn is a cross-team collaborative effort. I would like to thank Kristofer Baxter for bringing up the idea and leading the project, Mark Pascual for implementing the build pipeline, Michael Mamaril and Bhaskar Bhowmik for making CDN work with Brotli, and Ritesh Maheshwari and Michael Butkiewicz for analyzing performance data and providing ramp guidance.