Evaluating Connection Coalescing For Static and Media Asset Downloads

August 16, 2019

One of the core aspirations of the engineering teams at LinkedIn is to delight our members with instantaneous page load experiences. To achieve this, we strive for optimal configurations and settings across the different layers of our software stack. Recently, we completed a large-scale experiment in reusing TCP/IP connections and found a perceivable drop in the 90th percentile page load time. While the production rollout of this strategy is still in the planning steps, we wanted to share the initial findings, explain the details of connection coalescing, and walk through how we conducted the experiment.

As per Section 9.1.1 of the HTTP/2 specification in RFC 7540, browsers initiating a connection to HTTP/2 proxy servers may reuse an already established connection for a different domain a long as the server is authoritative for those domains. For example, say two domains, such as www.linkedin.com and www.static.licdn.com, resolve to the same IP and the server is authoritative for both www.linkedin.com and www.static.licdn.com when the client establishes a new TLS connection to www.linkedin.com. At a later point in time, when the client tries to connect to www.static.licdn.com, the previously established connection to www.linkedin.com can be reused as long as the server continues to be authoritative for www.static.licdn.com. This eliminates connection setup time, which has traditionally been a bottleneck for site speed especially in emerging markets. Connection set up time typically accounts for 5 to 7% of initial page load time given the TCP stack deployed at LinkedIn.

Here are a set of events that happen when a member launches the LinkedIn feed page within a desktop browser. The browser establishes a TCP/TLS connection with LinkedIn traffic servers after which the client issues an HTTPS GET request to load content for the page. LinkedIn servers process the request and return a base HTML page. The client then parses the HTML page and discovers static assets to be downloaded for the page from one of the multiple CDN providers - LinkedIn CDN is one among them.

We experimented with reusing connections already established to www.linkedin.com to download the base HTML page for when static assets were downloaded from LinkedIn CDN. Here’s how our experimental setup works when a member loads a desktop feed page:

A HTTP/2.0 capable client establishes a connection to www.linkedin.com over HTTPS.
The client inspects the TLS certificate returned from the host serving www.linkedin.com and in turn knows which domains (i.e., SANs in the TLS certificate) this host is authoritative for.
The client parses the HTML in the basepage.

However, before making requests for assets served over static.licdn.com, the client resolves DNS for static.licdn.com. If the IP address for static.licdn.com matches the IP of www.linkedin.com, the client reuses the existing connection. Illustrated below is the sequence of events at the client, edge server, and datacenter, with or without connection coalescing, when a linkedin.com feed page is initially loaded.

Without Connection Coalescing:

With Connection Coalescing:

We anticipated gains in TCP connect time meaning that there is a major performance bottleneck impacting latency of a page load. With connection coalescing enabled, we noticed a drop in 90th percentile page load time for pages whose static content was downloaded from LiCDN. This drop roughly matched TCP connect time experienced in a given PoP. We conducted this experiment in selected PoPs throughout the world for the desktop feed page and observed site speed improvements in almost all the locations around the world with a notable boost in client-perceived download time of 8% in India, 4% in U.S, and 5% in Germany. Here are the charts showing the 90th percentile desktop feed page load time in India with coalescing enabled (in orange) and coalescing disabled (in blue).

Page load time 90th percentile for India. Coalescing enabled is in orange, while coalescing disabled is in blue.

In addition to page load time, we also examined the number of new and reused connections with connection coalescing enabled. We found that 92% of connections are reused in India, 52% in Germany, and 72% in U.S.

In investigating why the percentage of reuse differed by markets, we found evidence that sophisticated algorithms used in the latest browsers to prefetch static content for browser caches played a role here. Our analysis showed that browser cache hit ratios in some countries were as high as 60 to 80%. The higher the percentage of browser cache hit ratio, the lesser the impact of enabling connection coalescing.

There can be other interesting use cases that can leverage this connection reuse technique. This experiment has shown that it is worth exploring such use cases to substantially improve the page load performance.

Acknowledgements

This experiment was a collaborative effort between the Performance team and the CDN and Traffic Infrastructure Dev team. We would like to thank Brandon Duncan, Ritesh Maheswari, Goksel Genc, Heather Mckelvey, Jon Sorenson, Samir Jafferali, Nitant Vaidya, Michael Mamaril, and Charanraj Prakash for their support and help in understanding LinkedIn traffic infrastructure, RUM data, the feasibility of this project, and experiment set up. A special thanks to Ritesh Maheswari and Prasanna Vijayanathan for their expert guidance in all of the performance experiments conducted in LinkedIn’s traffic infrastructure.

Topics: Optimization Infrastructure