Accelerating Code Delivery By 97% With Yarn Workspaces
December 15, 2022
As teams and applications experience growth, it’s critical to adopt architectures that optimize for clear code ownership, build isolation, and provide efficient delivery of code. While many projects start small with just one or two repositories (for example, frontend and backend), this approach often becomes difficult to maintain as the codebases expand. At LinkedIn, we develop many applications that receive regular contributions from a multitude of teams, with each team owning distinct products or features. Our infrastructure teams enable developers to work effectively within these large applications without being impacted by the sheer scale of each codebase. In the face of challenging productivity problems, our LinkedIn Talent Solutions (LTS) teams recently adopted yarn workspaces, unlocking a 97% improvement in lead time for delivering commits to our deployment pipeline, reduced from 39 hours to 125 mins.
LinkedIn Talent Solutions is the central piece of our hiring ecosystem, which houses a broad spectrum of products including LinkedIn Recruiter, Jobs, Talent Hub, Career Pages, Talent Insights, and more. We own the foundations of this ecosystem and build distributed, highly scalable products that connect talent with opportunity at a massive scale. These suites of products enable recruiters, job seekers, and enterprises to source, connect, and hire talent from LinkedIn’s economic graph, generating eight hires a minute on LinkedIn. This monumental task is made possible by our ongoing efforts to invest in building consistent, quality code at scale.
When we first began developing what is now our largest Talent Solutions product suite, the frontend codebase was structured as a classic monolithic application. As we built out features, the repository grew organically according to the product needs, much like most projects. However, over time, the monolith outgrew its usefulness as unclear ownership, increasing build times, and other pain points cropped up in the ever-growing application. It was becoming difficult to conduct maintenance work, such as migrations and upgrades, and required multiple teams to closely coordinate in order to land fixes across the codebase. Changes made to any part of the application required execution of our full test suite, even for unaffected features. Despite having been a reasonable architecture for launching the project, our needs had eclipsed the monolithic approach.
To solve these problems, we began extracting portions of our code into separate repositories, each aligned with an area of functionality. These codebases were owned by the team responsible for that part of the product, and they could each be built and tested fully in isolation from the overall application. By only containing a portion of the code shipped to production, our engineers experienced improved build times and faster feedback cycles during local development. Each repository could be versioned and published independently, decoupling unrelated product areas. This approach also enabled the separation of foundational infrastructure from our core application, as well as code sharing between applications as we expanded into new ventures.
The multi-repo architecture served us well for several years, but our continued growth trajectory led to a rapid expansion of code. Four years later, we had over 70 distinct repositories housing frontend code exclusively for Talent Solutions applications. While we still reaped much of the benefits intended by this approach, several pain points had cropped up over the years.
With our code spread across so many repositories, developers relied heavily on tools like yarn link to aid local development and test features end-to-end within our application. Yarn link is a command that allows local packages to be connected to one another, enabling developers to run their code across projects with unmerged changes. We found the complexity of our dependency graph made such tooling unreliable, and instead of linking one package to another, we’d often be linking three or four codebases together at once. This also meant dependency management became difficult, with version upgrades and migrations requiring boilerplate changes to be repeated upwards of 70 times depending on how many packages were impacted. We also relied heavily on automated tooling to upgrade our packages as they were published, but with dozens of commits being merged every day, even automation could not keep up with our pace. Given the scale of the problem, we would need to invest much more in custom automation tooling to keep up with our growth trajectory.
Since each codebase was versioned independently, developers would often find themselves writing multiple tightly-coupled pull requests (PRs) across the ecosystem to ship a single change, having to wait for each change to publish before integrating it further. Features took longer to reach production as a result of going through several cycles of our tooling pipeline. For the 20 top-level repositories consumed directly by our application, our analysis found that for PRs to reach our deploy pipeline (a metric otherwise known as “Commit-to-Publish”), the P90 (90th percentile) measurement was ~39 hours, not counting time to complete code reviews. Developers were waiting multiple business days to ship a change to production, and that timing was even longer for lower-level libraries in our dependency graph.
Analysis of our pipeline revealed that even automating the upgrades of our individual packages as they were published could only go so far in managing our complex dependency graph, requiring some test suites to be run three or more times. For example, the application tests were run after merging a library PR to ensure upstream compatibility, then again by the automation tool to confirm the upgrade PR could be created. They would be run two more times after that, once in the pull request itself, and finally a fourth time to ensure everything passed after merging.
As our number of repositories increased, this growth created stress for the automated tooling handling our upgrades. With so many package versions being published throughout the day, upgrade PRs were encountering merge conflicts and frequently needed to be reconciled. Over time, the P90 for landing an automated upgrade crept upward, eventually reaching the point of taking a full 24 hours to upgrade a single package version. This delay was significant because our tooling was creating over 2,000 upgrades each month, around 100 version bumps per day, to keep our dependencies fresh across all repositories. This trend threatened to worsen as our needs continued to scale, so we re-evaluated our architecture with an eye on improving these pain points.
Enter Yarn Workspaces
For the past few years, we had been eyeing the potential of workspaces, a technology offered by several package managers including npm, pnpm, and yarn (our current package manager of choice) that enables first-class support for a new type of monorepo. Unlike the monolith we began with, workspaces can house multiple distinct projects that cross-reference one another within the same repository. This meant we would be able to maintain the clear ownership and build isolation of our multi-repo architecture while eliminating the need to utilize yarn link. Additionally, all code changes within a package are instantly available to the application and other consumers in the workspace.
Moving our repositories to workspaces would also eliminate the need to version and publish packages independently. This, in turn, meant we no longer needed to run our test suites as often to ship a change. By co-locating the code within a single repository, we could run package tests at the same time as the application tests, ensuring their compatibility with one another. Our average package had a P90 of 18.63 minutes to run its own tests, but that would now be replaced by a single workspace build that ran application tests in parallel with each package’s tests, at an initial P90 of 44.2 minutes. We then eliminated the steps that were normally required to publish and upgrade individual packages, removing the P90 of 85.96 minutes for the average package’s publishing step as well as the P90 of 1.55 days for automatically upgrading the package within our application.
Even though running our application tests was more expensive than running the test suite for a smaller library, removing the intermediate build steps would save significant time, at no cost to our ability to capture regressions quickly. Both test suites would be run prior to merging a pull request and again after merging into the main branch, ensuring sufficient test coverage was maintained. With this streamlined pipeline in place, we estimated our Commit-to-Publish P90 for library code would be 125.1 minutes, decreasing by almost 95%!
To create our workspace, we wrote a script that could automate the migration of each repository. Given the name of a package, the script first cloned it into a temporary directory, then removed any files that were unnecessary for workspaces (e.g. .gitignore, .npmignore, and yarn.lock). It leveraged git mv to move the files to their new workspace destination before adding the cloned directory as a temporary remote and utilizing git merge with the –allow-unrelated-histories flag to merge the external library into the application’s git history.
Finally, we registered the new package by adding it to the application repository’s root package.json, making sure to declare any additional dependencies that had previously been transitively required. We also adopted a strategy of syncing dependency versions across the entire workspace, ensuring every library was being built and tested against the same packages deployed to our vendor bundle in production.
Rethinking Our Builds
The code migration itself was neither the beginning nor the end of this project, however. Even before adopting workspaces, the scale of our application had been pushing the limits of our test infrastructure. In just one year of growth, the duration of a single test run had increased over 100% from a P90 of ~45 minutes to nearly 100 minutes, even with tests run in parallel on the machine. As we considered the impact of adding thousands more library tests to the build, it became clear our current trajectory was unsustainable.
To address this need, we converted to a distributed build, spawning test runs on separate machines for each library and the application itself, since every package already supported being built and tested in isolation. This approach ensured our only bottleneck would be the slowest individual build, which we knew to be the application suite. We further distributed unrelated steps of the application build to reduce the impact of that bottleneck. The resulting test run, prior to our workspace migration, cut execution times by over 50%. In fact, even as we migrated workspaces into the repository, build durations remained consistent, even declining slightly as we continued to increase our capacity for distributed builds.
Distributed test runs were only part of the solution, however. In addition to the core application tests, which already existed within the repository, we now also included the tests from each individual package being migrated. With the vast amount of tests to be included in our builds moving forward, the chance of a flaky test or infrastructure error causing build failures would increase significantly. So, we embarked on a strategy of dynamic minimal testing, which avoids running tests for packages unaffected by a particular change. Meanwhile, our core application tests continue to execute in every build, preventing regressions from our multi-repo testing coverage. In effect, a change made to one package will run the tests for that package and any package that depends on it, including the original suite of tests housed within the application. These are exactly the same tests that would run in the past when these packages lived in separate repositories.
In each pull request, our tooling reads a list of files from the GitHub changeset, matching them to a set of package names to which they belonged. We also traverse the application’s dependency graph to construct a list of packages that cross-reference one another within the workspace. From there, we build a filtered list of impacted packages, including those which are directly altered as well as packages that depend on them directly or transitively within the graph. This list is utilized to determine the distributed test builds for a particular PR, minimizing the surface area of testing necessary to ship a change.
The adoption of yarn workspaces generated a sizable impact on developer productivity within LinkedIn Talent Solutions, with 28 repositories migrated to our workspace thus far. Our initial analysis, which predicted a 95% improvement in code delivery, seemed incredibly ambitious at first. However, by the conclusion of the project, we were able to hit that milestone. At present, the six-week trailing Commit-to-Publish P90 for our application was 70 minutes, a 97% reduction from the comparable metric for our external libraries prior to their migration. Our new architecture removed the need for over 2,000 version upgrades per month, significantly reducing strain on our automated tooling. Instead of waiting multiple business days for their code to reach our deployment pipeline, our engineers can now ensure same-day readiness for the vast majority of their code!
Workspaces also enabled qualitative benefits such as improving code discoverability by co-locating our code within a single repository. This created further opportunities to apply codemods and migrations simultaneously across multiple libraries, making these rollouts more efficient. Simplifying our dependency management by aligning all packages to the same versions ensures better dependency freshness across the board. Engineers gain confidence and peace of mind, knowing their local development environment will always match the versions being deployed to production.
With our new workspace in place for several months now, we have already begun to see the impact on developer satisfaction, efficiency, and productivity from this new platform. In fact, recent surveys showed that engineers within Talent Solutions overwhelmingly report positive experiences with our workspace architecture. When asked whether this project improved their developer experience, 54.1% of respondents strongly agreed, while another 29.7% agreed with the statement.
Yarn workspaces enabled the evolution of our application architecture without sacrificing the benefits of our previous multi-repo strategy. Restructuring our test configuration to leverage distributed builds yielded more than a 50% improvement in test durations for pull requests without any regressions in coverage as we streamlined it further with a dynamic minimal testing strategy. With our core focus on accelerating code delivery for Talent Solutions engineers, reducing the Commit-to-Publish P90 by 97% made great strides toward that goal!
It’s important to re-evaluate application architecture as codebases grow and teams’ needs evolve. While one approach may serve well at a given point in time, there is no one-size-fits-all solution. For the LinkedIn Talent Solutions team, workspaces were the perfect tool to handle our current and future scale, which has already been proven out by even larger applications that adopted workspaces at LinkedIn. With this architecture in place, we continue to reap the benefits of build isolation and clear ownership of each library while gaining a more robust local development experience and a streamlined testing pipeline. We would encourage any team or organization to consider these benefits and whether a shift to workspaces fits their own development needs!
Major thanks go out to our colleagues on the LTS UI Infrastructure team, in particular Yang Piao, Victoria Shi, and Angela Pan. Furthermore, we greatly appreciate our many partner teams across LinkedIn, including Flagship Infrastructure and Productivity, Client Application Frameworks (Robert Jackson), Code Collaboration (Jinzheng S. and Prince Valluri), and the many product teams within LinkedIn Talent Solutions who supported our efforts. In particular, we’d like to acknowledge the efforts of Arthi Ravishankar and Brenden Palmer from Flagship Infrastructure, whose contributions were vital in transforming our testing infrastructure to support workspaces. Finally, this project would not have been possible without the support of our engineering leaders: Abilash Badri, Matt Burdick, and Rahul Sule.