The LinkedIn Data Center 100G Transformation
March 24, 2016
LinkedIn’s continued growth will require us to grow our data centers to the mega scale in the next 3-5 years. Project Altair is our approach to creating a massively scalable data center fabric. The new LinkedIn data center being built in Oregon, known internally as LOR1, will be our first built for mega data center architecture that will enable us to step from the tens of thousands sized servers’ fleet to hundreds of thousands of servers fleet. The new network architecture in this data center will support the need for an extended number of servers and enable a mix of multi-tier servers with different network connectivity level utilizing 10/25/50/100 technology step.
In this blog post, we will explain how we leveraged 100G technology to build highly-scalable and cost effective software that will transform our systems into next generation data centers.
The LOR1 Network Architecture
The LOR1 data center network is built based on a 4 planes five stage folded Clos network, we have made a decision to go with on 1RU only building blocks to build the full data center flat network supporting initially six digits of servers.
All four of LinkedIn data centers are built based on a pod configuration with thousands of servers per pod and a total of 64 pods, while the pods are uniquely built and can optimize localized traffic, to enable scaling and transition to mega data center environment we created a flat fabric with a fixed end-to-end latency and oversubscription ratios. Some of the unique aspects of our network are:
- No over subscription from the spine deeper (1:1)
- End to end oversubscription better than 6:1
- Fixed end to end latency – All switches single chip single hop
- 1RU switch boxes only
- The same switch across all the data center (32x100G)
- Six digits of servers capacity with the above oversubscription
- Three levels of upgrade in network capacity and server count
- Easy to manage, easy to scale
The diagrams below show multiple representation of the fabric architecture for LOR1.
Note: The pictures above show the LOR1 architectural implementation of about 100,000 servers that is not specifically reflective of the actual number of servers in LOR1.
100G as Base Technology
With the adoption of 100G technology we found ourselves in a dilemma. On one hand, when operating on the cutting edge, you get a lot of benefits, like capacity, features, and scale. On the other, it usually comes with a price tag. At LinkedIn, we were able to break that conflict. We did it by leveraging the PSM4 technology. We took the 100G PSM technology and deployed it in a split 50G configuration. That enabled us all the benefit of the latest switching technology at the price for optics that is half the price of 40G optical interconnect.
Looking at the per-port cost, the price of a 40G optical module (Single mode) like LR4-Light is comparable to the price of PSM4 module. However the PSM4 module delivers two ports per module, and additional 25 percent bandwidth compared to the LR4-Light.
On a large-scale data center – and even a smaller scale one – savings can reach millions of dollars in CapEx investment for a better and faster solution. For reasons of confidentiality, we cannot share actual cost calculations here, but I encourage all of you to do the math for your environment and the reflective cost savings.
The only challenge with such a solution is to share two logical ports over a single physical port, called Quad Small Form-factor Pluggable (QSFP) interface. In view of the large capacity switches in the industry, we have enough ports to address this problem, but we need help from the optical modules suppliers to deliver to us 50G modules, PSM2 or CWDM2. These will be the optimal vehicle for migration of any data center from the 10/40G domain to the 10/25/50/100G domain and drive the whole industry to the next level. I would like to call for action, if you interested in developing 50G technology lets work together and make it a reality.
At LinkedIn, we plan to continue to drive and enable the cutting edge technology on optical interconnect and future data center interconnect technology. We are looking forward to 200G and an eight-channel version of QSFP that will fix the thermal shortcoming of QSFP and will enable breakthrough technology to allow for a cost effective 200G/400G technology to emerge. We will continue to support and enable innovation on all the dimensions of data center from networks and interconnect to server and storage.
We built a highly-scalable cost effective data center technology leveraging 100G as the baseline technology, utilizing split 100G into two 50G. We will continue with those efforts to build the best data centers to enable the LinkedIn application and services innovation for the years of growth in front of us.