Understanding network quality: The rise of customized content delivery
June 10, 2019
“Make LinkedIn feel instantaneous” is a mantra our performance team lives by to deliver the best possible experience to our members. We are constantly on the lookout for ways to improve our site speed and user experience, be it through optimizing our infrastructure and edge, accelerating our web and mobile applications’ performance, or tuning our systems and making our services more efficient.
Experiments at the edge
A vital area of focus to achieve our aspiration to make LinkedIn feel instantaneous is to deliver content as effectively as possible to our members. This demands continuous brainstorming, coming up with optimizations, and experimenting to see their effects. We run a lot of such experiments at the “last mile,” covering LinkedIn’s infrastructure that directly connects to our members over the internet and tremendously influences how fast content is delivered to them.
In the recent past, some of the experiments we have run at our edge have yielded interesting results.
For example, when we looked at how three different experiments performed across four countries, we found that different “optimizations” operate differently across geographies. Case in point is a recent study we did with TCP congestion control algorithms.
We continued to see such behavior in other experiments as well. This trend is illustrated in the chart below. For example, experiment 1 shows a lot of improvement in India, compared to other regions, while experiment 2 was more divisive, showing big gains in India but degradations in all other regions.
These findings weren’t too surprising, but reinforced the idea that we needed to deliver specific optimizations to certain regions.
Next, we analyzed the data from one experiment over a controlled population, specific to a region. The results were more concerning now. The chart below shows how much improvement we got each day from this experiment. And it varies a lot—from an improvement of 5% one day to swinging to a 2% degradation the next! (You can get an idea of how much is a lot here.)
It is critical to understand the implications of this pattern. In a controlled environment, it is very rare that results of such configurations at the edge fluctuate between being very good on one day and bad on another.
We conducted further analysis by breaking down the data from the experiment by Autonomous Systems, or ASNs (think your internet provider) in a given region. This gave us a better idea of what was happening behind the scenes. We eventually uncovered the fact that even in a given region, different networks come with highly divergent characteristics. In other words, an optimization that we think works may work for members on some networks, but not for others. By rolling out such optimizations, we were forcing a “selective penalty” on some members, while improving the content delivery for others.
Here is the chart for the above experiment, considering the top eight ASNs in the region we were investigating.
This variability is ominous to performance and there is no guarantee that all our members are getting better experience from such “optimizations.”
The missing piece
To prove our hypotheses about network characteristics playing a significant role in determining impact to our members, we took an experiment as an opportunity to understand how member engagement is impacted when provided with a faster site speed experience.
The experiment entailed analyzing and classifying members’ networks based on their historical performance (RUM) data, as “fast,” “average,” and “slow” network quality. We then served a “faster” (lighter) application experience to a randomly chosen segment of members. After a few days of experimentation, we analyzed the data and compared how members engaged with the application across the three classes of network quality.
We found that member experience and engagement improve when we deliver a faster application. The impact is that such an application engages members on “slow” networks a lot more than members who are on faster networks.
To understand if our members liked the faster experience, we looked at how many user sessions were created in each segment and how many members visited the application. For example:
In each network quality class, the improvement in site speed gradually improved across the board with the members on fast networks seeing the least improvement, while the ones on slow networks saw the largest gains.
Subsequently, the number of unique members visiting the application and the number of sessions both improved proportionally.
One segment stood out to us as an anomaly: the number of sessions did not grow proportionally with site speed improvement for the average class. We concluded through further analysis that this is most likely due to the “lighter” nature of the experience not being “preferred” by this class of members over a slower experience. This is likely where the trade off between site speed and features becomes critical.
This analysis proved two hypotheses:
Members are more engaged on a faster application. As the experience gets faster, user engagement goes up. This is especially true for members already on slow networks.
Network quality plays a significant role in understanding our members’ experience. By appraising the network quality of a user, we could customize their experiences by providing them what they prefer, suitable to the network they are on.
Network quality as a service: Customizing content delivery
The experiments and analyses above helped us better understand how members’ network quality, among other factors, influences their experience on the application. We can then use this knowledge to customize the content delivery to their needs and enrich our members’ experiences on LinkedIn by undertaking the effort to provide network quality as a service within LinkedIn.
Defining and measuring metrics for network quality is fairly straightforward to implement and employ. The challenge arises in circumstances where measurement might either be antiquated or simply infeasible. To handle such scenarios, we have built a deep learning pipeline using RUM data to be able to predict the network quality of every connection to LinkedIn.
Stay tuned for Part 2 of this series to be focused on delivering customized content to our members.
The entire story leading to network quality as a service has been a multi-team effort spanning many quarters. This has involved many engineers and managers across the Performance Engineering, Edge SREs, Traffic Infra Dev, Data Science, Flagship Web, LinkedIn Lite, and AI/ML teams at LinkedIn.
I would like to specifically thank Ritesh Maheshwari and Anant Rao for supporting us through this journey. Special mention to Brandon Duncan for putting up with us through the process and being supportive of such exploratory efforts!