Optimization

The TCP Tortoise: Optimizations for Emerging Markets

Serving fast pages is a core aspiration at LinkedIn. As part of this initiative, we continuously experiment and study the various layers of our stack and identify optimizations to ensure that we use the most optimal protocols and configurations at every layer.

As LinkedIn migrated to serving its pages on HTTP/2 earlier this year, it became imperative that we identify and use the most optimal transport layer strategy for our users’ network. Being a Transmission Control Protocol-centric (TCP) infrastructure, we initiated an effort to study the effects of different TCP congestion control strategies in different geographies. We found that with the right strategy, we could significantly improve our site speed, resulting in up to 7% faster content downloads.

Why is this important?

LinkedIn adopts a popular strategy of using different delivery methods to serve static versus dynamic content to serve content to our users. We do this by serving cached static content, like fonts and images, from third party Content Delivery Networks, or CDNs, while all of our dynamic content is served through LinkedIn’s own Points of Presence, or PoPs. As is the case in today’s internet, the base page’s HTML is the first resource that a user’s browser needs to serve a web page. This resource must be received by the client before the client can start to request other content, so it is imperative that we serve the base page as quickly as possible from our PoPs, thereby speeding up the entire page load. To do this on HTTP/2, there were a few aspects of our network configuration that we looked into.

TCP congestion control
At a high level, TCP congestion control follows a conservative approach to avoid congestion in the network. A congestion window is maintained by the sender for each connection. The congestion window helps determine the number of packets that could be outstanding at any given time, thereby limiting the rate at which the link’s capacity is exhausted.

When a new connection is set up, the congestion window is initialized to a predetermined number, usually a multiple of the Maximum Segment Size (MSS). This value is scaled based on a conservative “additive increase, multiplicative decrease” strategy. The congestion window increases by a constant value for every successful round trip (measured by Round Trip Time, or RTT), until either a threshold value is reached or a timeout occurs. If the threshold is reached, the congestion window increases linearly thereafter.

The sender maintains a timer to ensure that acknowledgements for the sent packets don’t take too long. A timeout occurs when this timer expires, indicating packet loss and therefore congestion in the network. When this happens, a few steps are taken to adjust the congestion window and the threshold, following which a “slow start” is engaged. Once congestion is relieved, the window size is cautiously ramped up again.

Clearly, from the sender’s perspective, the value of the congestion window determines how much data is transmitted in each round trip, thus determining the throughput of the connection. When a network is characterized by longer delays in reaching its destination, or when there are stray packet losses, they could easily be mistaken for signs of congestion and drastically limit the congestion window. These are commonly referred to as the “High-Bandwidth” and “Lossy-Link” problems, which elucidate the intolerance of the default strategy towards small losses that may not necessarily be caused by actual congestion in the network.

Thus, the choice of an optimal TCP congestion control strategy becomes critical to prevent fallacious deceleration of our site speed. This becomes exigent when coupled with HTTP/2 because it reuses a single TCP connection per origin.

HTTP/2
HTTP/2 sessions each establish a single TCP connection with multiplex streams over the same TCP connection. Though this strategy saves network round trips in setting up TCP connections, multiplexing too many streams on a single connection could easily lead to overuse of bandwidth, which could be misconstrued as congestion at the TCP layer. This could have even deeper implications in emerging markets, where the already suboptimal network conditions (e.g., longer round trips and higher bandwidth-delay product) could quickly compound to slowing down connections.

TCP versus TCP

Over the years, there have been numerous TCP congestion control strategies that have been proposed to solve different problems posed by TCP. From our initial round of experiments with 11 TCP congestion control algorithms, we picked out the best performing three strategies that had different approaches to solving the congestion control problem. We compared these to the default algorithm on our infrastructure, HTCP.

The table below provides a feature highlight for the algorithms we compared.

Algorithm What it does best
TCP-Hybla Built for networks with long round trip delays. Window update is based on a ratio of current RTT and a reference RTT0.
TCP-Scalable Built for performance on high-speed, wide area networks. Window updates use fixed increase and decrease parameters.
TCP-YeaH Built to be fair, efficient, and prevent Lossy-Link penalties. Switches between fast and slow modes, based on an estimate of queued packets.
HTCP Built for long distance, high-speed transmission. Window updates are based on time since last loss event. This is the default algorithm on our Linux machines.

Because these algorithms approach the congestion conundrum so differently, there isn’t one right choice for every type of network. For instance, a network characterized by significantly long round trip delays but not necessarily marred by congestion would be ideal for TCP-Hybla. Thus, the study of the behavior of these algorithms allowed us to characterize certain networks and configure them accordingly.

The setup

TCP2

To study and evaluate the four chosen algorithms, we ran the experiment on real LinkedIn members and created a test bed, configured in our PoPs, which provided a mechanism to study the algorithms’ impact on the content download time of our pages. The test setup in the PoPs consisted of an equal number of machines configured with each of these four algorithms. Each group of these machines served statistically equivalent loads, with connections distributed at random. We recorded real user data and monitored the content download time for each strategy, as well as how it ultimately affected our page load time. We also performed statistical analysis on the data obtained using Bootstrapping and Wilcoxon tests to make sure that the differences in numbers were statistically significant.

Sitespeed improvements

As we expected, the data from the experiments showed significant results in emerging markets like India and China. We observed consistent improvements in content download time in each of these geographies with all our experiment groups—Hybla, Scalable, and YeaH—in comparison to the control group, HTCP.

Desktop

TCP3

In line with our expectation, LinkedIn’s desktop pages in India saw an improvement of 5-7% in content download time, which also translated to a 2% improvement in page load time (at the 90th percentile). The PoPs in China showed similar results. In both cases, TCP-Hybla performed the best consistently through our weekly comparisons. The results were also verified through statistical comparisons of the data sets.

TCP4

Mobile
Our mobile number comparisons were similar to our observations on desktop environments in emerging markets. For instance, the chart below depicts the trends observed for page load time in China on LinkedIn’s mobile app. Here too, TCP-Hybla rules the roost, showing as much as a 6% improvement in page load time. Mobile users in India showed an 8% improvement for the same comparison.

TCP5

Rest of the world
We also conducted our experiment in our PoPs located around the world. We saw minor gains in some geographies, but nothing as significant as in our emerging markets, particularly in India and China. Other geographical areas where we observed improvements were in Singapore and South America.

Summary

Optimal tuning of TCP congestion control algorithms and configurations provides massive site speed gains in emerging markets. TCP-Hybla in particular performs the best in these network conditions, suggesting that these networks are long latency networks and are not necessarily congested. We believe that further understanding these network characteristics and meticulously tuning our configurations will further improve our site speed.

Future work
We are actively studying the specifics of the algorithms that showed improvements, and we will use what we’ve learned to characterize our user networks so we can better tune our TCP configurations. An in-depth understanding of these characteristics, coupled with the data collected from around the world, will allow us to dynamically set the most optimal tuning settings to provide the best user experience to our customers.

Acknowledgements

The TCP experimentation and study is a collaborative effort between the Performance team and Traffic Infrastructure Dev team at LinkedIn. I would like to thank Goksel Genc and Siddharth Agarwal for their support and help in understanding our traffic instructure and the feasibility of the project, and Rahul Malik and Shen Zhang, who helped with configuring and deploying the required TCP algorithms and settings throughout this experiment. I also want to thank Ritesh Maheshwari and Anant Rao for their continued guidance with performance analysis and insights throughout this project.