Open Sourcing Bluepill: Run iOS Tests in Multiple Simulators
January 18, 2017
Testing is a key component of LinkedIn’s 3x3 strategy. As we continue improving our iOS continuous delivery pipeline, we are faced with two major obstacles—tooling stability and scalability. We needed a tool to run iOS UI tests both reliably and quickly. For this reason, we created a project, called Bluepill, that we are open sourcing today. Bluepill is a reliable iOS testing tool that runs UI tests using multiple simulators on a single machine. Bluepill has saved LinkedIn thousands of developer hours, and we believe it can also provide a great benefit to anyone running iOS UI tests at scale.
There are two major limitations with the standard, out-of-the-box iOS tooling: stability and scalability.
As with other companies doing iOS development and testing at scale, we faced many challenges dealing with iOS simulator stability. In a blog post last year, we elaborated on how we dealt with simulator flakiness by experimenting with different environment configurations in order to find an optimal workaround. However, since iOS Simulator is a black box and keeps evolving with every Xcode update, we were always chasing a moving target in terms of stability. We gradually came to accept that no matter how robust or resilient it appeared to be in any given version, we were still fragile and unprepared for any volatility from future changes in iOS Simulator.
Xcode only supports one simulator at a time. Therefore, tests must be run sequentially. In the case of the LinkedIn app, which has around 2,000 UI tests, this would take about 15 hours. In order to achieve a commit-to-publish time under three hours, we must run the tests in parallel.
As we looked to solve these problems, we found two existing solutions, neither of which ultimately met our needs adequately.
1. Distributed testing
The first stab we took to scale iOS UI testing was to divide the test target into a subset of targets and then distribute them among different machines. As mentioned in the post iOS Build Speed and Stability, we rolled out distributed building and testing support for our products. However, there were two problems with this approach:
Tooling stability: The tooling stability on our Mac machine pool is around 98%. If we distribute the test target between 10 machines, a build can only pass when all 10 child jobs succeed. With this approach, the tooling flakiness is exacerbated, since each additional node consumed exponentially increases the chance of hitting the flaky failure scenario. With 10 nodes running tests, the tooling reliability drops to 98%10 = 82%.
Capacity requirement: The second problem with hardware parallelization is capacity. At peak time (e.g., during lunch and dinner rushes), we have around 80 concurrent continuous integration jobs. Running them with hardware parallelization would require 80*10 = 800 machines. When we approached the problem, we only had a fourth of that capacity available. The excess jobs had to be queued, and developers’ commits would have to stay in the queue for hours before being tested.
2. Project Hydra: a Python wrapper to run tests in multiple simulators
The first stab we took to address the distributed testing problem was an initiative to run tests in multiple simulators. Inspired by Facebook’s xctool and a proof of concept of running iOS tests on multiple simulators from Johannes, we built a Python wrapper on top of xctool to run iOS tests on multiple simulators. This approach helped stabilize our continuous delivery environment. However, we found several problems with this approach:
It was based on xctool: xctool is a great tool to make it easy to test iOS products. However, active development on xctool was stopped, and the project owners no longer maintain it. This left us with only two choices: either fork and refactor xctool, or build our own testing tool. After some investigation, we found it was easier to build a simpler tool to focus on running tests in multiple simulators.
It was a mere Python wrapper of xctool binary: The Python wrapper was built on top of the xctool binary and it didn’t have access to the CoreSimulator APIs. Managing simulators was difficult, since we couldn’t talk to the simulators directly.
As existing solutions couldn’t satisfy our requirements, we decided to build our own iOS UI test runner to execute tests in multiple simulators. The tool is written in Objective-C, and is built on top of Apple’s CoreSimulator framework.
The project name Bluepill is inspired by the The Matrix's “blue pill,” which represents an illusion. Bluepill creates an illusion that tooling just works magically, so that engineers can focus on coding.
Bluepill runs tests in parallel using multiple simulators. The main features supported are:
- Running tests in parallel by using multiple simulators.
- Automatically packing tests into groups with similar running time.
- Running tests in headless mode to reduce memory consumption.
- Generating a junit report after each test run.
- Reporting test running stats, including test running speed and environment robustness.
- Retrying when the Simulator hangs or crashes.
Here’s a demo of Bluepill in action:
It is quick and easy to start using Bluepill! In a simplified scenario, you just need to run the following command, and Bluepill will kick off four simulators to run your tests in parallel. By the end of the test run, it will generate a report in ./output.
./bluepill -a ./Sample.app -s ./SampleAppTestScheme.xcscheme -o ./output/
Alternatively, you can have a configuration file like the one below:
And run ./bluepill -c config.json
A full list supported options are listed here.
Open sourcing Bluepill
We’re more than happy to announce that Bluepill has been open sourced under the BSD 2-Clause license and that the code is available on Github. Contributions and suggestions are welcome!