Journey to GitHub: Ensuring Developer Happiness Every Step of the Way
February 16, 2023
At LinkedIn, we have traditionally maintained our source code in large, monolithic repositories (repos). While this worked for over a decade, it proved challenging to maintain and support as our repo grew larger, and issues like long build times, difficulties managing merge queues for reverts, and slow deployments led to differing levels of frustration for our developers. The developers’ message back to us was clear: They weren’t satisfied with the status quo and wanted to adopt industry standard tooling that would solve these pain points. In 2014, we began moving code from the monolithic repo on our Subversion (SVN) source code management (SCM) system to thousands of smaller git repos.
For several years, we had repos on both SVN and git, and as you could imagine, the separation proved challenging. Even basic functionality like conducting a code review required the use of third-party tools, which were difficult to maintain and scale internally. In 2020, we finally completed our migration from SVN to git and were in a position to consolidate both code hosting and code reviews onto one platform. After a comprehensive analysis of the various platforms, comparing each across a number of key pillars, LinkedIn chose GitHub as the platform to unify its development ecosystem.
In this post, we will talk more about how we approached the key stages of this journey, what led us to choose GitHub, how we prepared for the migration, and what actions we took to ensure developer happiness remained at the forefront of our strategy.
Making the choice
Evaluating SCMs was no small feat. We had more than 150 requirements across six key pillars: scalability, performance, visibility, reliability, extensibility, and security. The following are just some of the criteria that we considered:
Scalability: LinkedIn has more than 34,000 repos across over a terabyte of data, with our largest repos consisting of thousands of files and more than a half a million lines of code. Our active code contributors make more than 200,000 commits every year, with double that number in code reviews.
Performance: We have continuous integration (CI) jobs that run for every commit that goes into the main branch, adding up to roughly 12 million jobs in 2019, with a peak concurrency of about 1,200 jobs running at once across approximately 4,000 machines. Performance and scalability go hand in hand, and we needed to ensure that our SCM was capable of meeting our requirements.
Visibility: Statistics around code authors and reviewers, signals to distinguish pipeline features from code and test failures, the ability to tag people, clear documentation, intuitive workflow UIs, code search, and the ability to report bugs were just some of our visibility considerations.
Reliability: While our platform of choice needed to be resilient and provide proper service level agreements (SLAs), we also needed to ensure that there were recovery strategies and disaster recovery management tools in place.
Extensibility: We needed to be able to integrate with existing tools, such as Jira for internal issue tracking, but also extend SCM features to accommodate our workflows. This meant command line interface (CLI) parity and strong APIs for querying, commits, code requests, and so on. Our platform of choice needed support for creating custom plugins and extensions in the UI or within our network to ensure a successful integration.
Security: With our breadth of user personas, we needed a platform that offered proper authorization policies and permission controls, such as Single Sign-On (SSO) and Multi-factor Authentication (MFA), while government regulations also demanded features like branch protection policies.
Once we determined that GitHub could handle our scale and our requirements, it was time to start preparing for the migration.
Choosing a migration path
The first task at hand was to decide what method we should use to migrate LinkedIn’s thousands of repositories over to GitHub. One option was a “big bang” approach that would move all of LinkedIn’s repositories to GitHub in one fell swoop. The SCM is fundamental to our business operations in more ways than hosting code. We have built countless tools with certain assumptions of the underlying SCM, such as the git protocol or an auxiliary API. Switching everything over all at once could cause problems that would break these tools and bring development to a halt. Similarly, if the SCM was unable to handle the traffic of a “big bang” approach, we could again find ourselves stuck.
We quickly realized that migrating at each team’s own pace was preferred. Not only might teams have other competing priorities, they might simply feel more comfortable moving after spending time learning about the new workflow experience on GitHub. This approach would also allow us to ensure that all tooling was compatible before being brought over to the new system.
To that end, we built a self-serve migration tool with a command line interface (CLI) that guided users through the migration process, bringing the repository’s associated metadata—such as versions, committers, artifacts, and deployment statuses—with it. Once a team submitted a migration request via the CLI tool, it ran asynchronously, notifying relevant parties via email of its progress. This tool made our migration process both granular and uniform.
Laying the foundation
LinkedIn builds and maintains numerous internal tools to facilitate the developer experience and GitHub plays a key role in any workflow that requires making code changes. As an enterprise using GitHub in the cloud, we built a unified interface between GitHub and LinkedIn called Cloud Bridge that acted as a gateway for incoming webhooks and as a management tool for internal GitHub Apps. Today, Cloud Bridge ensures that any traffic between our servers, which comes in the form of HTTP requests, isn’t lost, either due to a momentary spike in load, downtime, or because of interference by the layers of network filtering in place, such as firewalls.
A GitHub App is an entity on GitHub that has its own identity and webhook URL. Cloud Bridge provisions GitHub Apps on GitHub and generates a unique webhook URL for each app, while also securely storing encrypted credentials such as client ID, secret, and private key. The owners of each GitHub App can then choose which of their services should be allowed to assume the identity of their GitHub App on GitHub. Those services are then allowed to make a request to Cloud Bridge to fetch a refreshed token that they can use to make API calls. This is done through certificate-based authentication. Additionally, the incoming webhook events from GitHub are relayed by Cloud Bridge into a dedicated Kafka Topic for each GitHub App, to which one or more internal services can subscribe. All incoming webhooks are added to the Kafka queue. This is especially handy for situations like a redeployment of a GitHub app, where there could be a couple seconds of downtime and the webhooks would be lost. Instead, the app can pick up where it left off without missing anything.
Empowering developers during migration with custom workflows
Over the years, LinkedIn has been considered as one of the most trusted digital platforms. Therefore, we have strict security policies and compliances for the code review process. Every line of code that a developer writes needs to be peer-reviewed by at least one code owner who is an expert of that particular code base. To counterbalance the friction added by the regulations, we introduced static analysis tools, alongside other automations, such as an “Owner Approval” GitHub App, to free up reviewer time so they can focus on the actions that are difficult, if not impossible, to automate.
From day one of a repository’s life on GitHub, we loop in the developers who will be working with this code on GitHub to ensure that the GitHub workflow applies to all use cases at LinkedIn. We started migrating a group of pilot users who gave us valuable feedback that helps us continuously improve both the migration process and the tooling integrations.
When a pull request (PR) is opened, the changed files’ code owners are notified and assigned to review the PR, while the Owner Approval App notifies both the reviewers and the author who has yet to review the PR. At the same time, a series of static analysis, or static application security testing (SAST), tools run automatically to ensure that the proposed changes conform with our coding standard before they are merged and published.
If a problem is detected, an inline annotation is attached to the problematic line(s) of code. And to save time, the PR author can enable auto-merge functionality, which will auto-merge the PR if the static analysis tools don’t find any problems. Once the PR is merged, the change goes through our CI pipeline, which not only builds but also validates that the changes will not break any downstream consumers of the code, since one library can have as many as 6,000 consumers. The Owner Approval App reports the build result in the PR, which helps keep developers on GitHub as much as possible, avoiding the context switching that could impact their productivity.
Figure 1: The static analysis tools run on one of the PRs
Administrating LinkedIn’s GitHub instance
Similar to other enterprise users, LinkedIn has its own set of requirements and business logic when it comes to code collaboration workflows, which often require customized configurations on GitHub. For instance, enforcing SOC2 compliance requires certain branch protection rules for repositories at LinkedIn. To manage those settings at scale, we decided to build a separate, internal administration application.
Built on top of a suite of GitHub APIs, our application applies organization and repository settings using YAML files, which are created by end-users, in an infrastructure-as-code pattern. This method not only removes LinkedIn’s GitHub core team from the critical path, giving control back to users, but also provides auditing trails that are useful for troubleshooting and security purposes.
Quantifying migration progress (and happiness)
Since first deciding to move to GitHub, more than 60% of our code review requests are created on GitHub, and we expect the migration to be complete in 2023. As we move toward fully adopting GitHub across the entire company, we have made use of Customer Satisfaction (CSAT) scores to verify our developers’ satisfaction, with the Developer Productivity and Happiness organization within LinkedIn performing regularly scheduled surveys. While it can be difficult to quantify qualitative terms like productivity and happiness, we have found that CSAT scores offer a strong indicator of those qualities. Getting this sort of developer feedback regarding their tools and experiences helps us fine tune what we’re building and helps us adjust our future roadmaps.
With our core engineering team’s hard work, amazing cross-functional internal collaborations, and great support from GitHub, we are proud to share that GitHub is LinkedIn’s new code collaboration platform, and has been consistently been rated by LinkedIn engineers as one of the more favorable tooling products, with a CSAT score over 4.49 (on a 5-point scale) for the last six quarters. Before GitHub, this same CSAT score sat in the 3.3 - 3.8 range.
While the scores might speak for themselves, our engineers have also offered testimonials that are even more encouraging:
It is so awesome to be able to standardize on a modern and friendly interface like GitHub. Ty!
I don’t think there’s been any tooling change that's impacted me (positively!) anywhere near as much as switching to GitHub in the nearly five years I’ve been here.
… there’s nothing else even in the same weight class …
In choosing a path forward, it is really important to understand your needs and to see the gaps between where you are and where you want to be. You should gather input from various stakeholders across your organization to understand those needs and use them to evaluate potential platforms. For us, it was a pivotal moment to realize that GitHub basically met all of our needs, and where it didn’t, its extensibility allowed us to fill the gaps. Having a vision to work towards, while always keeping your customer—the developer, in this case—in mind was a key in our success. Our basic philosophy is that, even if the path to get there requires additional effort, if the customer is satisfied, it’s worth it.
I would like to thank Aishwarya Agarwal, Jacek Suliga, Jason Toy, Jinzheng Sha, Joshi Kosuru, Joyce Wang, Kimberly Bautista, Melody Park, Nakarin Kamkheaw, Stephen Yeung, Terrence Hung, Zhao Chen, Ziming Yang, and many partner teams that helped us along the way!