Getting Code to Production With Less Friction and High Quality
June 18, 2015
A key point of frustration for many developers is a slow and inflexible release cycle. The slower the release cycle, harder it is for a developer to move features from inception to production. This creates an ever increasing backlog that a developer needs to manage and sustain, when they could be building new features. LinkedIn's solution is to this challenge is our shift to an increasingly agile release cycle. This allows our developers to build the various part of a feature in an iterative manner and can help make the development cycle easy to manage.
The LinkedIn Payments system is comprised of the checkout UI and the backend systems. We originally used end-to-end testing for our service certification. This introduced two issues: 1) End-to-end tests are slow to execute and 2) only using end-to-end tests made them less reliable and usable for testing specific services.
In order to achieve fully automated, high quality, and low latency releases, we identified the following as key parts of our new automation strategy:
*Release latency = Time it takes for a published build to reach production servers.
Note: rINse is our internal Integration test library. RTF(Restli Test Framework) is our internal mocking library. UITF(UI Test Framework) is our UI End-to-End testing framework. rINse allows us to author API testing in declarative mechanism to ensure proper coverage at each component. RTF allows the service isolation by record-replaying the responses between the components.
Typically the phases where tests may be run are:
Our release process has become more deterministic. Issues are isolated earlier in the process; a developer is able to push a feature to a production server within a time frame with greater certainty. Thanks to LinkedIn Payments Team, LinkedIn Test Tools Team for all the help in designing and implementing this process as well as special thanks to Randal Moore for helping author this blog post.
Problem Space
When a developer aspires to push a feature to production servers, rigorous certification testing is typically required. This release ritual often changes each time; usually comprising of running tests, manually analyzing failures, re-testing the failures and iterating until all the bugs in the release build are fixed. A lower release frequency increases the susceptibility to issues in production as the changeset is proportionately larger.Case Study
In order to improve our release frequency and quality, let's study a recent upgrade to the Payments system at LinkedIn.The LinkedIn Payments system is comprised of the checkout UI and the backend systems. We originally used end-to-end testing for our service certification. This introduced two issues: 1) End-to-end tests are slow to execute and 2) only using end-to-end tests made them less reliable and usable for testing specific services.
In order to achieve fully automated, high quality, and low latency releases, we identified the following as key parts of our new automation strategy:
- Test services in isolation
- Execute tests at earlier part of development cycle
- Team ownership of quality
Initial State
We primarily used end-to-end testing for our service certification. We knew our tests were slow to execute and our use of only end-to-end tests made them less reliable and usable for testing specific services.*Release latency = Time it takes for a published build to reach production servers.
Testing Services in Isolation
Correlation of a failures to a service requires ample tests targeting that specific service. Testing services in isolation improves the association between tests and services. Instead of relying only on end-to-end UI tests, a mix of testing strategies was found to be more effective for our build certification process. Unit, Functional and Integration tests work in concert to quickly ensure the quality of a build.Note: rINse is our internal Integration test library. RTF(Restli Test Framework) is our internal mocking library. UITF(UI Test Framework) is our UI End-to-End testing framework. rINse allows us to author API testing in declarative mechanism to ensure proper coverage at each component. RTF allows the service isolation by record-replaying the responses between the components.
Execute tests at earlier part of development cycle
Running tests at every step the code travels through the development process is critical in being able to identify the bugs in the code as early as possible. The cost of fixing an issue increases as the software travels further down the development process. LinkedIn Test Tooling allows us to execute the tests at different stages of the development process. LinkedIn has Continuous integration system that allows integrations of quality checkpoints at each step of software development cycle.Typically the phases where tests may be run are:
- Before code commit
- Start of code commit
- Build completion
- Deploying a build to the staging environment
- Deploying a build to the production environment
Team ownership of quality
Quality is the responsibility of the whole team. Quality control is most efficiently achieved if software quality is considered at every step in the development cycle. A software quality process will benefit from an appropriate distribution of test automation ownership between teams cooperating in a software development effort.Results
By adopting an improved automation strategy the LinkedIn Payments Engineering team achieved more frequent releases with greater confidence in measured quality. Our release metrics are now:Our release process has become more deterministic. Issues are isolated earlier in the process; a developer is able to push a feature to a production server within a time frame with greater certainty. Thanks to LinkedIn Payments Team, LinkedIn Test Tools Team for all the help in designing and implementing this process as well as special thanks to Randal Moore for helping author this blog post.