Scaling LinkedIn’s Edge with Azure Front Door
June 16, 2020
Co-authors: Viranch Mehta, Jon Sorenson, Samir Jafferali
As LinkedIn has grown to more than 690 million members, we’ve expanded our edge platform to 19 Points of Presence (PoPs) across 5 continents over the years to be closer to our members and provide a quick and reliable experience. As we started evaluating how to best scale for the future needs of our business and member growth trajectory, the time and hefty resources required to continue building our edge infrastructure made the cloud an enticing alternative. Today we’re excited to announce we’ve migrated the LinkedIn experience and edge infrastructure to Azure Front Door (AFD). This move scales us to 165+ PoPs and drives a better member experience by improving median page load times by up to 25 percent.
Our edge infrastructure is how your device connects to LinkedIn. Data from your device traverses the internet to the closest PoP, which houses HTTP proxies that are the entryway into our network. This PoP forwards your device’s requests to an application server in one of our data centers, and the responses are returned to your device by reversing that route. By entering our network early in the journey, the network protocols that carry your traffic benefit from several latency optimizations and your traffic spends more time on our fast dedicated network instead of traversing the public internet.
LinkedIn’s global footprint of 19 PoPs across 5 continents
AFD is Microsoft's global application and content delivery network that delivers their key cloud products such as Office, Xbox, Bing, Teams, and Azure. Switching to AFD nets us numerous benefits including massive capacity, an expansive network backbone, and rich peering, all of which lead to vastly better last-mile latency and resilience. We’re also able to evolve our edge with Microsoft’s “Intelligent Edge” vision around 5G, edge computing, and a massive footprint expansion. AFD’s extensive compliance certifications and strong stance on security improve on our already strong security posture.
Azure Front Door’s global network of 165+ PoPs
Our migration was data-driven and methodical, starting with smaller LinkedIn subdomains such as blog.linkedin.com, before moving toward a full-site migration, while learning and solving numerous technical challenges along the way.
We ran an A/B test where an equal number of randomly selected members were directed either to the LinkedIn or Azure edge for a week in order to accurately measure differences in performance and user engagement. The members directed to AFD saw steep reductions in Page Load Times (PLT) and we observed improved business metrics like the number of page views and sessions. For example in India, where both LinkedIn and AFD have a PoP in Mumbai, AFD's extensive peering and additional locations in Chennai, Hyderabad, and New Delhi, reduced the median Android PLT by over 20%.
Median Android Page Load Time improved by up to 25%
In addition to accelerating page delivery, AFD also improves LinkedIn’s availability. Case in point: when the largest U.S. ISP had a national outage that prevented their subscribers from accessing large portions of the internet, including LinkedIn, our Real User Measurements (RUM) from our mobile app and browsers revealed that LinkedIn was in fact fully accessible during this time through AFD and its strong peering relationships. Peering creates dedicated private connectivity between Microsoft infrastructure and ISP subscribers that bypasses the public internet, thus insulating Microsoft’s connections from internet instability. Microsoft is connected to most key ISPs through its massive peering arrangements, something which has been cost prohibitive for LinkedIn to do.
Reachability of LinkedIn for real members during ISP outage degrades only via LinkedIn PoPs
In preparation for the ramp, teams across LinkedIn were mobilized to evaluate, plan, and operationalize our efforts. This ranged from redesigning our next-generation data center origins to deep analysis of performance gains, even obtaining Git access to the AFD source code to start co-developing. As part of our evaluations, we realized that the “pay-per-use” nature of AFD offered significant benefits over our infrastructure costs. AFD includes all the new PoPs they deploy and we only pay for the bandwidth we use, whereas with our PoP footprint, we had to build and operate a capital-intensive infrastructure that’s necessarily over-provisioned with capacity for traffic surges, growth, and DDOS.
Projected cumulative cost (in millions of US Dollars) of LinkedIn edge infrastructure versus cost of leveraging AFD
We’re looking forward to continued collaboration with the AFD team to optimize our traffic stack. We’re already planning a redesign of our origin infrastructure, wrapping the Azure edge with infrastructure-as-code automation, and testing Azure CDN with IP coalescing. We’re also starting to think about experimenting with AFD’s future offerings around HTTP/3, TLS 1.3, and 0-RTT handshakes, among other things.
This migration has been a multi-year, multi-org effort with exhaustive experimentation. There are many learnings and stories to share about pivoting our edge to the cloud, so be on the lookout for future related posts.
This project would not have been possible without the help of so many teams at LinkedIn, including: Network & Systems Engineering, Traffic Developers, Edge SREs, Mobile Infrastructure, Performance Engineering, Data Science, Slideshare, Microsites, Partner Engineering, IBE, Infosec, Legal, and Compliance.
At Microsoft, we want to thank the AFD Engineering and Product leads Daniel Gicklhorn, Isidro Hegouaburu, and Sharad Agrawal, as well as their teams. They championed this on the Azure side, supporting and enabling us throughout the journey.