How we retired Python 2 and improved developer happiness

January 29, 2020

Nearly 20 years after the first release of Python 2 and 11 years after the first release of Python 3, the Python development community has retired Python 2.7, the last of the Python 2 series. This marks the end of all upstream support for Python 2, including bug and security fixes, and allows developers to devote their time fully to Python 3, which is faster, delivers more consistency, and brings lots of great features for developers to take advantage of (i.e., asyncio, type hinting).

In 2018, LinkedIn embarked on a multi-quarter effort to fully transition to a Python 3 code base. After approximately two quarters of planning and two quarters of execution, we phased out the use of Python 2 across new products and even new builds of existing products. The transition was led by our Python Foundation team with multiple teams and departments playing an integral role to ensure the relative smoothness with which the transition was executed. In total, the effort entailed the migration of about 550 code repositories (libraries, applications, and services). Kudos to our colleagues for understanding the need for this migration and for doing their part in moving our infrastructure into the future.

At LinkedIn, Python is used not only to deliver online experiences to our members, but also for accessing internal systems and services, including deployment tools, CI/CD framework, scripting, command line interfaces, data science tools, and more. Alongside Java and JavaScript, Python is a critical language for our engineers to get stuff done. We don't use Python in our product or as a monolithic web service, and instead have hundreds of independent microservices and tools, and dozens of supporting libraries, all owned by independent teams in separate repositories. The relationships between these repositories are represented by complicated dependency graphs. 

Our transition story started long before the official migration kicked off. As Python 3 became more important within the larger open source ecosystem, most of our internal libraries were ported to be “bilingual,” meaning they could be used in either Python 2 or 3. This approach has long been considered the ideal standard because it provides a smooth transition period for consumers of those libraries. It’s also not that difficult to do if the developers are careful and the libraries have a very clear model of which data is bytes and which data is text (human-readable strings, i.e., Unicode). Python 2 conflated these two distinct concepts, but Python 3 forces you to understand and be explicit about textual data models. This is required to enforce data consistency and avoid most of the dreaded UnicodeErrors that can plague Python 2 code bases. 

In my long history of such transitions, this is the most critical clarity needed to enable successful porting. Bilingual libraries mean you can’t take advantage of many of Python 3’s most appealing new features. However, as long as Python 2 is still supported, the tradeoff is well worth it because bilingual libraries allow consumers of those libraries to port to Python 3 at a convenient time with minimal resources. On the other hand, applications generally do not need to be bilingual—they only ever run in one version of Python, so they can take advantage of all the new features, modules, and improvements of Python 3.

Given that the migration affected all of LinkedIn engineering across so many disparate teams and thousands of engineers, the effort was overseen by our Horizontal Initiatives (HI) program. The Python Foundation team served as the focal point, actively engaged with developers across all of engineering to analyze our existing code bases. They identified product owners, created work tickets, answered questions as they arose, reviewed changes, and tracked impending deadlines. We had about 550 repositories that needed porting, so we gave ourselves and our partners two quarters to complete the entire initiative. We split the work into two phases, implemented in consecutive quarters:

  • Phase 1: In the first quarter of 2019, we performed detailed dependency graphing, identifying a number of repositories that were more foundational, and thus needed to be fully ported first because they blocked the ports of everything that depended on them. These included some internal libraries that were not yet compatible with Python 3 as well as applications (command line tools and microservices) that had no blocking dependencies. This was calculated at about 75 repositories, which was fairly easy to accomplish within a single quarter and gave us a lot of important experience, feedback, and confidence to kick off the overall porting effort.

  • Phase 2: In the second quarter of 2019, we identified the remainder of repositories that needed porting, in addition to any carryovers for phase one repositories that may have missed their deadlines for whatever reason.

With this phased approach, we met our target date for completion. We pushed the change to our build system that disabled the ability to build Python 2 applications and stopped building the Python 2 versions of our internal Python libraries. 

Note that it was explicitly not a goal to remove Python 2 support from any bilingual libraries. However, now that we’ve completed the migration, our library owners are opportunistically dropping Python 2 and modernizing their code bases, taking advantage of whatever appropriate Python 3 features make their code more readable or efficient. (Let’s all take a moment to celebrate good test suites!)

Post-migration reflections

Our primary indicator for completing the migration of a multiproduct was that it built successfully and passed its unit and integration tests. For repositories with high coverage test suites, this worked well in practice. However, some of our code bases had low coverage numbers. While passing builds and tests was still our best signal, our confidence in the fidelity of the port was at times mitigated by those low test coverage numbers.

For other organizations planning or in the midst of their own migration paths, we offer the following guidelines:

  • Plan early, and engage your organization’s Python experts. Find and leverage champions in your affected teams, and promote the benefits of Python 3.
  • Adopt the bilingual approach to supporting libraries so that consumers of your libraries can port to Python 3 on their own schedules.
  • Invest in tests and code coverage—these will be your best success metrics.
  • Ensure that your data models are explicit and clear, especially in identifying which data are bytes and which are human-readable text.

Now that LinkedIn engineering has fully embraced Python 3, we no longer have to worry about supporting Python 2 and have seen our support loads decrease. We can now depend on the latest open source libraries and tools, and free ourselves from the constrictions of having to write bilingual Python. We are opportunistically and enthusiastically adopting type hinting and the mypy type checker, improving the overall quality, craft, and readability of our Python code bases. We currently support Python 3.6 and 3.7, and are planning on rolling out Python 3.8 early in 2020.