IPv6 at LinkedIn Part I
"ChippIn'" Away at IPv4
July 13, 2016
Coauthor: Tim Crofts
To celebrate the anniversary of World IPv6 Day (June 6, 2011), we at LinkedIn wanted to mark the occasion in a significant way. We’ve worked for a number of months on enabling IPv6 in our data centers. We’ve designed the new architecture and prepared network, systems, and tools, so on June 6 we enabled IPv6 in one of our staging environments. This was a milestone toward having functional dual stack IPv4/IPv6 in all our data centers, which will be our next goal before we start to retire, or begin “chippIn” away at, IPv4.
In this series of posts, we’ll discuss different aspects of migrating from IPv4 to IPv6, specifically including some of the challenges larger organizations like LinkedIn face in that process. But first, a little background information on IPv4, IPv6, and the rationale behind the migration process.
A brief history of IPv6
The history of IPv6 is entwined with the history of IPv4. IPv4 was formalized in 1981 by the Internet Engineering Task Force (IETF). In 1992, proposals came from the internet community to extend the addressing scheme of IPv4. The IETF created a working group, and by 1996, a series of documents called Request For Comments, or RFCs, formalized IPv6 (each document name starts with RFC and is followed by a sequential number to help reference it).
Meanwhile, to delay the exhaustion of IPv4 space, three ranges of public IPv4 addresses were reserved for internal private networks (RFC1597). Most organizations expose very few services to the internet, so their need for public IP space is quite small. Once you minimize the need for public IP space for these organizations by allowing them to use a private IPv4 range, it reduces the number of requests for IPv4 addresses. Thus, the complete depletion of the public IPv4 address space is postponed by several years.
In 1996, the IETF formalized document RFC1918, which is an improvement over RFC1597 because it talks about the different needs of devices communicating with each other depending on if they are in a private or public space. Three years later, in 1999, Network Address Translator (NAT) was explained in RFC2633 to automatically convert internet packets coming from a private network, forward them to the public internet, and then convert the response received from the public internet back to the private network. With NAT, internal devices can initiate communications with public devices, but not vice versa. However, NAT did introduce a few issues. For instance, when two companies merge, sometimes they have to resolve internal routing conflicts because they had used the same private IPv4 range prior to merging. All the private IPv4 traffic sometimes looks like it’s coming from one single public IPv4 address, so as a public service provider, if you block that public IPv4 address from communicating with you, you may actually be blocking thousands of devices or users.
At the end of 1995, RFC1883 was published and marked the beginning of IPv6. IPv6 was first deployed on operating systems and routers in a lab, then subsequently a public experimental network was built. As more RFCs were published and more implementations and issues were resolved, the IPv6 network became suitable for production quality traffic, but February 2008 could be considered the real start of IPv6 because that’s when the Internet Assigned Numbers Authority (IANA) added IPv6 addresses to the root zone of the Domain Name System (DNS). This allowed the IETF to run an IPv4 outage experiment at their conference in March that year. For about an hour, engineers tested what they could do on an internet with no IPv4. Many software patches were released just after this experiment.
On June 6, 2011, the Internet Society did a “World IPv6 Day,” during which major companies tested their websites on IPv6 (in addition to IPv4) for 24 hours. One year later to the day was World IPv6 Launch Day, when many companies switched on IPv6 on their web site and left it on for good. For LinkedIn, our website became accessible via IPv6 in 2014.
Today, IPv6 traffic has surpassed 10% of global internet traffic, and in some countries like the US, it represents more than 30% of internet traffic. In Belgium, it’s more than 50%. Certain mobile networks are mainly IPv6 as well, with more than 80% of traffic. Apple has now announced that all mobile applications must support an IPv6-only network, using NAT64 (like NAT, but to convert IPv6 packets to IPv4 and back) to reach the IPv4 parts of the internet. Facebook, Akamai, and we at LinkedIn have also found, independently of each other, that end-to-end usage of applications on IPv6 (especially mobile applications) is faster in significant ways.
Main differences between IPv4 and IPv6
IPv4 addresses use 32 bits, while IPv6 addresses use 128 bits. The IPv6 header is bigger because the IPv6 addresses are bigger, but it is of fixed length, while an IPv4 header packet is of variable length. Moreover, there is no more checksum in IPv6, as generally the checksum is either done at the hardware level or by the upper layers. This reduces the router’s workload. Finally, in IPv6, the routers do not fragment packets; instead, if a smaller packet needs to be passed, the router will send back an ICMPv6 Packet Too Big (PTB) for the sender to resend a packet of the right size. These enhancements simplify how much processing routers need to do on each packet, making them more efficient.
IPv6 has been architectured so that a device can get its own IPv6 address without the need of a centralized service using Stateless Address AutoConfiguration (SLAAC). There are two main types of IPv6 addresses: link scope and global scope. The local IPv6 address is only available on the network segment, while the global is often built from the network advertised by the router combined with the MAC address of the interface. A centralized service also exists, DHCPv6, that is based on Dynamic Host Control Protocol (DHCP) but has some changes to co-exist with SLAAC.
Migrating to IPv6
Migrating to IPv6 presents an interesting Catch-22: you need your most popular sites to be on IPv6, but it is actually much easier to migrate smaller networks onto IPv6 than larger ones. One server on IPv6 can be manually configured and set up quickly, but a large site has routers, load balancers, firewalls, monitoring tools, equipment, and software that need to transition to function on IPv6 as they do on IPv4.
The World IPv6 Day and World IPv6 Launch Day allowed the internet community to iron out several issues with migrating the special equipment and software larger sites often require. Fixes in routers, load balancers, and firewalls were applied, and better geolocation was implemented. More web caching services became available, security filters were tuned to allow the processing of packet too big (PTB) messages, DNS infrastructure was enabled on IPv6 for better response time, and many other improvements were made.
Converting an internal network to IPv6 is, however, another story—that task requires many other components to work together and to be scalable. For large deployments, even RFC1918 space is not enough to address all internal network needs. Offices in every country need their network, and the cloud needs a lot of machines; therefore, for big networks, eventually even 10.0.0.0/8 becomes too small, and issues with overlapping address spaces become a nuisance. Translation between address spaces using solutions such as NAT can help address this problem, but this solution also introduces a new set of issues, such as embedded IP addresses in applications, scaling to handle large numbers of translations, and the various security implications of introducing a “NAT firewall.”
In an ideal world, you would be able to throw a switch to turn off IPv4 and turn on IPv6, but in most cases this is not possible. We believe that the best strategy for transitioning an internal network is to first migrate to a dual-stack environment with a well-defined exit strategy for moving off IPv4 entirely. This gives some time for everyone to become familiar with IPv6 and start migrating their tools, software, and processes over IPv6, so that eventually very little traffic is done or needed over IPv4. However, the obvious downside is that supporting a dual-stack environment is not only more complex, but also the operational overhead of maintaining two networks increases the deployment time of new services, as well as introducing issues when the two protocols are not in sync. IP address management can also become a complex task, as a mapping between the IPv4 and IPv6 address spaces is typically needed for various functions such as ACL implementation, route summarization, load-balancers, etc.
It is therefore best to prepare for the future and start migrating to IPv6 as soon as possible and to then gradually move an increasing amount of traffic onto IPv6. Since supporting two network stacks is nearly twice as complicated, once you get onto IPv6, you will want to remove IPv4 as fast as you can.
IPv6 migration at LinkedIn
A dual-stack environment allows you to migrate servers to IPv6 on a case-by-case basis. Some applications are easier than others; for instance, some of the core UNIX services are definitely low-hanging fruit.
On June 6, we decided to pull the trigger on one of our staging environments, specifically the one where the “new” LinkedIn is tested before you see it each day. We planned to add a global IPv6 address on all the systems in that environment and to migrate as many of the core services as possible in order to drive up traffic over IPv6.
We needed to make sure that we didn’t break services that had not been tested, so to control for this, we relied on the basics of how the two IP stacks work. For example, a lot of our software is Java, and by default, Java applications need a declaration at run time that they are using IPv6. For that reason, we did not have to worry about the software suddenly finding IPv6 and starting to use it on its own. We also decided not to put an AAAA DNS record on the hostname of any of the machines that got a global IPv6 address; thus, other languages initiating a connection would not do so over IPv6. This helped us ensure that other software, tools, and utilities continued to serve traffic over IPv4 as before, and the environment remained undisturbed.
Starting in the early afternoon that day, we added a global static IPv6 address on more than 1,500 systems within a few minutes. Automation, which is important for any system at this scale, allowed us to deploy these changes so quickly. After we had validated that there were no issues running on IPv6, we added AAAA records for these hosts, which left the OS to prefer IPv6 over IPv4. Once all this was in place, we started to bounce some of our core infrastructure services (DNS, syslog, SMTP, NTP, central authentication, etc.) to move these services’ traffic from IPv4 to IPv6. Within about an hour, we had ramped our infrastructure up to running IPv6 in a controlled way for several of these services. Over the coming months, we will continue to work on moving more infrastructure services to IPv6 traffic, as well as work on getting our software frameworks like Rest.li to begin preferring IPv6 traffic over IPv4.
In parallel with these efforts, we are currently building a new data center to serve production traffic to our members. This data center has been designed to accommodate dual-stack IPv4 and IPv6 from the start. It also has many other new technical advancements. We refer to them as Project Altair, and several are already described elsewhere on this blog.
With an IPv6-enabled staging environment and a production environment, we will be able to move increasing amounts of our internal traffic to IPv6. Based off of our initial success on June 6, we are confident we’ll be able to add IPv6 to additional internal environments.
Once we are comfortable with IPv6 traffic, it will be time to think about removing IPv4 and having a data center that runs IPv6 only. This is not a distant goal; we would like to achieve it in 2017. Supporting both IPv4 and IPv6 is nearly twice the work, so once you decide to adopt IPv6, you want the transition phase to be as short as possible. However, many vendors and software products don't currently support IPv6 fully; this will have to change in order for us to achieve our migration goals.
We have an “AAAA team” with “IPv4 disposal experts” here at LinkedIn. This is an awesome technical project, and we need more organizations like ours to migrate their data centers to IPv6, so that, collectively, we can solve the remaining issues that hinder IPv6-only deployment in data centers and in the cloud. Please join us in this endeavour.
The following people contributed to this blog post through their participation in our AAAA team:
Zaid Ali, Sriram Akella, Andrey Bibik, Donaldo Carvalho, Bo Feng, David Fontaine, Prakash Gopinadham, David Hoa, Sanaldas KB, Henry Ku, Prasanth Kumar, Vikas Kumar, Tommy Lee, Leigh Maddock, Navneet Nagori, Marijana Novakovic, Ved Prakash Pathak, Stephanie Schuller, Chintan Shah, Harish Shetty, Andrew Stracner, Veerabahu Subramanian, Shawn Zandi, Andreas Zaugg, David Paul Zimmerman, Paul Zugnoni.