3x3: Speeding Up Mobile Releases
February 3, 2016
LinkedIn recently released Project Voyager, our codename for the new version of our flagship application for Android, iOS, and mobile web. Voyager is the result of more than a year of product development work by over 250 engineers. We took the opportunity to rethink the LinkedIn experience from the ground up, not only from a product perspective, but on the engineering side as well.
Before Voyager, we released the LinkedIn app once per month. Prior to each release, we picked a release candidate and handed it off to the testing team, which performed a manual, four-day regression test suite. If they found any bugs, we made a hotfix and gave the testing team a new build. Depending on the severity of the bug, this might require another round of testing. Engineers rushed to get their code checked in before the monthly release candidate deadline, or waited an entire month to deliver their features or bug fixes to our members. Product managers and marketing partners had to plan their schedules around this release process, and hope that each release went out on time. It wasn’t easy to iterate on member feedback because we had only 12 releases a year. This setup wasn’t ideal for anyone. We wanted the release cadence to be a product decision, not something determined by engineering and testing constraints.
Continuous integration is old hat in the web frontend and backend world, but it’s still rare for native mobile apps. For true continuous integration, we ship every commit directly to production – but this is clearly not realistic in the world of native apps, where binaries must be published in the App Store (with Apple’s week-long review process) and Google Play, then downloaded by members. On the engineering side, we needed an aggressive goal to ensure we didn’t fall back into the pattern of manual verification of each release. We also wanted our product partners to be free to ship as often as they want (without annoying members of course!). We settled on a rule that we call “3x3”:
Release three times per day, with no more than three hours between when code is committed and when that code is available to members.
Obviously, we can’t ship to the App Store and Google Play every three hours, but we can give a new build to internal members (i.e. LinkedIn employees) multiple times per day.
Why the three-hour goal? There are two main reasons:
First, three hours is not enough time to conduct any manual testing steps, so holding ourselves to this constraint ensures we won’t revert to using manual validation to certify our releases.
Second, three hours is not enough time to test everything end-to-end. This may seem counter-intuitive: the more tests we have the better, right? But remember that the goal is not just to automate our releases, but also to enable us to iterate faster. If engineers spend 20% of their time writing new code and the rest of their time refactoring the fragile UI tests that break every time code is changed, we aren't achieving our goal. We prefer an approach where product owners decide which production-critical paths require UI tests, and resolve some of the edge/negative cases with unit tests, which are much faster and easier to maintain.
To achieve our goal, we needed a complete automation pipeline for every step, from code commit to production release. This pipeline needed to be fast enough to fit inside our three-hour window, and reliable enough for us to maintain confidence in our releases.
We begin by running a number of static analysis and code style checks and compiling the code. Next, we generate the production binaries, and then run our unit tests and UI tests on several different versions of Android/iOS. Finally, we run basic upgrade tests to make sure the user can upgrade from the previous stable version of the app to this new build without any major issues or crashes.
Once all of these tests have passed, we certify a build as “known good.” Every three hours, the deployment process picks the latest “known good” build and releases it to our internal alpha testers on the Voyager team. For Android, we use Google Play’s alpha testing functionality coupled with their developer console API to automate the release process. For iOS, we use a custom alpha testing setup using Enterprise builds. Every week, we pick the last “known good” build and promote it to Google Play and the App Store. In addition to our automated tests, we mitigate the risk of real members encountering bugs by having our developers place their new code behind feature flags, which can be enabled and disabled without releasing a new build to members. We can release a feature to internal members to identify issues without affecting external members. For both platforms, we provide tooling that generates internal release notes detailing what is in the new version, and sends it to the product managers and marketing team, who create public-facing release notes.
A few areas didn’t fit easily into our new 3x3 philosophy. Before past releases, several partner teams needed to review and sign off on each build. For example, the localization team would confirm that translations of new features were displayed properly, free of any language-specific bugs. If we wanted a release process without any manual steps, we had to find another way to perform this type of testing.
We worked with the localization team to break down the various types of internationalization bugs they frequently found. We identified many bugs that arose from improper use of localized string templates, and implemented static analysis checks to ensure that developers passed proper arguments to the string templates to prevent crashes. The other major category of bugs the localization team found involved UI layouts that did not render properly when given languages with special characters, or text that was especially long or short. We were able to catch most of these bugs using what we call “layout tests.”
Layout tests take a particular view from the app and populate it with permutations of data, then verify that views do not overlap and that text does not get cut off (among other checks). The final category of localization bugs involves situations where the semantic meaning of a translated string does not make sense in the context of the surrounding UI. There is no way to avoid human validation for these bugs, but they are a relatively small percentage of the issues found and are covered using periodic validation by the localization team.
Please see this post for more details on our iOS layout testing methodology.
Need for speed
If that sounds like a lot of testing, you’re right. As we focused on implementing the checks we needed to be confident in our releases, our commit-to-publish times started creeping higher and higher. The biggest culprits: building production binaries, and running our UI tests.
Both our Android and iOS projects build multiple binaries, including debug binaries, special internal beta/enterprise binaries, and multiple flavors of release binaries for different device types. Together, these various builds were adding ~40 minutes to our iOS builds and ~15 minutes to our Android builds. To speed things up, we developed a process for distributing our builds across multiple machines in our continuous integration pool. The parent node builds the minimum set of binaries needed to run tests, and fires off “child jobs” to other nodes that build the remaining binaries while the UI tests run in parallel.
On the iOS side, we faced numerous performance-related problems with the new Swift compiler that made our builds extremely slow. We spent significant time digging into the inner workings of the compiler in order to speed up our builds by up to 4x. Apple also fixed a number of performance and stability issues that we reported in the Swift compiler, which helped reduce our iOS build times.
For UI tests, we took advantage of Google’s Espresso tool for Android and Square’s KIF utility for iOS. Both of these are already quite fast, but we wanted to be able to run a large number of tests on different Android/iOS API levels and in different configurations (left-to-right vs right-to-left UI layout, for example). Similar to our distributed build concept, we developed a system for distributing our test suites across multiple machines in our CI pool and collecting the results when all of them had completed. On Android, we were also able to take advantage of running multiple emulators (up to 15!) on a single build machine, which greatly increased our capacity and scalability.
Struggle with flaky tests
Fast tests aren’t useful if they don't accurately report bugs in the code. Flaky tests decrease confidence in the entire suite and can lead engineers to ignore valid failures, assuming “that test is just flaky.” For more details on this, several articles have already described our efforts to improve test stability and reliability, and more are on the way.
Just getting started…
As we move forward in 2016, we will continue to improve both the speed and stability of our 3x3 pipeline. We want to empower our engineers to iterate quickly and develop world-class products without worrying about release schedules and deadlines.
We’re excited to share many of our learnings with the wider mobile development community, both through blog posts like this one and by releasing some of the tools we’ve developed as open source libraries. Stay tuned to the LinkedIn Engineering Blog for more information soon.
Most importantly, now that we have released Voyager to the world, we are already reaping the benefits of our infrastructure investments. The faster development cycle enables us to make rapid improvements to the mobile LinkedIn application, bringing even more value to our members.
We have accomplished all this through a dedicated company-wide collaboration of teams across LinkedIn, including Tools, Testing, Localization, Marketing, Mobile Infrastructure, and more. Thanks to everyone for your effort!