Faster testing on Android with Mobile Test Orchestrator
June 26, 2020
When testing Android applications (for example, the LinkedIn mobile app for Android), thorough testing practices can easily lead to a growth in the number of overall tests being performed—from hundreds to thousands to even tens of thousands. Clearly, there is a need to scale significantly, while at the same time keeping test run times reasonably short. This is particularly true in the context of a continuous automated pipeline for development to release, because large applications can have multiple commits in a short time frame, and a pipeline that is fast prevents backups.
To help solve this challenge at LinkedIn, we created the Mobile Test Orchestrator (MTO), a framework that orchestrates Android tests to run across multiple emulators and servers. With MTO, we’re able to keep our test run times to a reasonable limit and can easily scale our testing capacity as needed. In this post, we’ll provide an overview of MTO.
As the LinkedIn mobile app was being developed, thoroughness in testing caused the number of tests being performed as part of the software development process to grow at a rapid pace. As more features were added, more tests were added, causing test run times to grow quickly into tens of minutes. This increased the amount of time it took for developers to get feedback on their work, which in turn led to decreased productivity. If we didn’t take action, developer testing would become unsustainable. The goal was to contain test execution time to 10 minutes or less, while providing an easily scalable platform for distributing tests in simultaneous execution.
Growth in features and tests can quickly grow test run times
In one sense, the solution might be obvious: distribute tests across multiple resources to execute simultaneously. The real challenge, then, is to determine how to achieve this distribution. To understand that, let's first introduce two key concepts: concurrency and parallelism. These concepts are not unrelated—after all, concurrency can be implemented through parallelism—but the connotative meaning behind them is very different. Concurrency is about working smarter, such as executing tasks concurrently with the same start and end points, and using resources efficiently and wisely in doing so. Parallelism is about working harder, in this case by throwing more resources at the problem to execute tasks in true parallel fashion.
To understand this better, let’s first explore a little bit more about the basics of testing on Android.
The basics of Android testing
A large part of testing on any mobile system is UI and scenario testing against actual devices, via a USB bridge or simulators/emulators running on the host within their own VMs. Android provides a single tool, the Android Debug Bridge (ADB), to abstract the difference away. This makes it transparent to the clients of ADB whether they are communicating with a real device or an emulator. MTO leverages this fact to support testing on both, without any need for special logic. However, for the purpose of this post, the focus will be on testing against emulators.
Essentially, ADB allows for execution of commands and file transfers to and from the emulator. For example, you can use ADB to install APK bundles as apps onto the emulator, to push files to or pull files from the emulator, to capture a screenshot to a local file on the host, and much more. The ADB tool, of course, is also used to execute tests, and more importantly allows for execution of subsets of tests. This latter point is important, as it allows us to easily divide up our testing for distribution.
While there are many additional details of ADB, for this post it’s sufficient to know that ADB provides a bridge for conducting actions against an emulator or device and for processing responses. As we will see later, a key point is that ADB is I/O-centric from the host machine's perspective.
There are also some things specific to Android emulators to know. First, they are resource-intensive; they consume gigabytes of memory and in a multi-core configuration will consume multiple cores on their host when under load pressure. Second, when launching multiple emulators on a single host, the time to launch the emulators increases (non-linearly) with each emulator you add; startup times can therefore be long if launching a group of emulators at once.
Concurrency vs. parallelism
Returning back to the notions of concurrency and parallelism, you may already begin to see how the distinction plays out. Emulators, each running a subset of the tests, consume CPU as they execute in parallel. This is an ideal implementation of parallelism, as there is no interaction between emulators (only with the host conducting test execution). Each emulator is fully space- and time-partitioned. In practice, as we began implementing MTO, we saw each emulator take at peak greater than 200% CPU (i.e., greater than 2 fully dedicated cores). With multiple emulators working in parallel, the system is definitely "working harder" in this regard to achieve our goal of reduction in test run times, utilizing most of the available resources through simultaneous test execution.
The actions of the software coordinating the work of the emulators are a different story. This host software, the MTO software, acts as an "orchestrator," arranging the distribution of test execution across the multiple emulators. The interactions here are primarily I/O driven (through the ADB, as discussed previously), and are expected to be of low CPU utilization. In basic tests with MTO, the orchestration software was found to take roughly 1% of a single core. This orchestration is more a matter of concurrency than parallelism. In fact, the best and most efficient means is to execute this orchestration on a single core, interspersing the processing of I/O events as they happen in time. In other words, the software is "working smarter," containing the orchestration to a single core and executing concurrent tasks on an I/O-triggered basis. This orchestration software is where MTO comes into the picture.
Of course, you cannot scale the number of emulators on a single host ad nauseam. The resource needs are too great. For this, there is one more layer of parallelism: running on multiple machines (“worker nodes”) each running multiple emulators, and another machine to coordinate the overall execution (the “root node”).
In practice: Tying it all together
The overall basic system architecture of MTO, which allows it to support scalable distribution of tests, is shown below:
The nature of running distributed tasks across multiple machines and the I/O-centric nature of the coordinating software elements make Python a very good fit for implementing this design. MTO leverages Python's built-in multiprocessing module to ease development in a distributed environment. For example, the multiprocessing module supports proxy objects that add transparency to a local client when interacting with remote objects, as well as distributed queues. MTO also leverages Python's deep support for asynchronous I/O through its asyncio module and capabilities. This aligns well with the I/O-centric operations conducted by each host, which by their nature are asynchronous. Lastly, the choice to use Python also fits well with our internal tooling infrastructure at LinkedIn, which is primarily Python-based.
The test execution model
Another aspect we had to reconcile within the design of MTO is which execution model to use in the distribution of tests. There are a number of choices, but here I will focus on two. The first is a "push" model, where tests are divided into subsets up front and each subset is then distributed to the workers. The second is a "pull" model, where the root node holds a queue of tests to run, and each worker pulls the next test from that queue as it becomes available.
MTO implements the "pull" model as the means of test distribution and execution. There are a number of reasons for this choice. First, the order of test distribution and execution is important, and this is easier to manage in a pull model than in a push model. In general, you want the longer-running tests to execute sooner; if instead you were to run the longest test as the very last test, you would be extending the overall test time because this test will be the only one running, making poor utilization of resources and parallelism. In a push model, if you want to strategically order the tests, then the way in which tests are divided initially has to be re-tuned and re-balanced as you add more tests or scale to more emulators. In a pull model, on the other hand, there is a single queue of tests to be ordered. The ordering is independent of how the distribution occurs and is simpler to perform.
The pull model is also more robust to failures. If an emulator misbehaves or a worker node goes offline, the remaining nodes of the system can pick up the slack, since the remaining workers will simply pull the remaining tests from the queue one-by-one. Tracking has to be put in place to know which test (if an emulator) or tests (if a full worker node) goes down, of course, but this is fairly simple to do. In a push model, if an emulator goes down, the full set of remaining tests have to be re-distributed according to some sort of potentially awkward logic.
Emulator cloud options
So far, we have focused solely on test distribution and execution. Mobile Test Orchestrator also provides a framework for launching emulators and creating device pools, in order to easily scale testing operations. These two concerns are very much kept separate and independent within MTO. Indeed, you have the choice of managing the device/ emulator pool within the same process as test execution or in an independent process. Emulator pools use an underlying queue to manage the emulators on a worker host. Clients reserve a device from the pool, making it unavailable to other clients, and relinquish the emulator back to the queue (pool) when finished. This prevents the conflict of multiple workers utilizing the same emulator.
MTO provides two means for the workers to get a reference to a device/ emulator pool. The first is through discovery. The API provides a means to discover existing emulators already present on the host and instantiate a pool from those. The second is through launching emulators explicitly to create the pool. A third alternative is to use custom code outside of MTO to launch the emulators and provide them through a shared multiprocessing.Queue.
Regardless of how the emulator pools are created, several choices exist for operating an emulator cloud. Although MTO does not provide code for bootstrapping and managing the host machines that it will run on, the framework does not preclude options for how the overall emulator cloud is managed. One option is a persistent, potentially elastic cloud. Here, a pool of emulators across multiple hosts is kept active, growing or shrinking to meet the demand. In this model, resource allocation software would have to be implemented to manage a broad pool of emulators. Emulators would be allocated based on each client's needs and that subset of emulators would be handed off to the MTO software to execute the client's test needs. This model has the advantage of having an always-ready pool of emulators. It also carries the additional burden of monitoring the health of each emulator to ensure reliability and stability of the system on an ongoing basis.
The second option is a "dynamic cloud." Here, each worker host is responsible for launching the emulators, creating the workers to run the tests, and shutting the emulators down at the end of execution. The cloud of emulators exists only for the purpose of satisfying the client's test needs and only during the time span needed to execute the client's test plan. This solution can be ideal in the context of an existing continuous integration pipeline that already provides the ability to allocate host machines, but not the more specific allocation of Android emulators. One disadvantage is the overhead of emulator startup-time. However, when run as part of a full build-and-test pipeline, the process can be kicked off at the same time as build to mitigate or even eliminate that overhead. Here, since the cloud of emulators is a dedicated pool with a fixed lifetime, existing only during a single test run, there is no need to continually monitor the health of the emulator pool. At LinkedIn, we utilize our CI/CD infrastructure as a means to provision resources to run MTO tests. MTO is responsible for orchestrating the software on the resources that run our emulators and mobile tests.
As the number of tests for the LinkedIn flagship Android app was approaching 10,000, the execution time for tests alone was approaching 80 minutes on a single machine, albeit with 16 emulators. Through use of the MTO framework and distribution across 10 hosts each running 16 emulators, we have brought our test time down significantly, nearly reaching our target of 10 minutes. More importantly, the implementation is easy to scale to a larger number of hosts to support future growth in testing.
In testing of the MTO framework itself, the CPU utilization of the orchestration Python code was also measured to be low, as expected. For tests involving emulator interaction (installation of apps, running apps, pushing and pulling files, etc. the total CPU utilization of the orchestration software was between 1 and 2%, versus 200%+ utilization by the emulators. As expected and desired, the lion's share of resource utilization resides in the emulators. Of course, these numbers were obtained from a test application that was neither overly simplistic nor overly taxing of the emulator. Real utilization of the orchestration software can be less for longer running tests and be more when executing shorter running tests.
Mobile Test Orchestrator provides a framework for distributing and executing Android tests across multiple emulators running across multiple machines. MTO helps contain test execution times and makes it easy to scale to a larger number of machines and emulators as needed.
We expect to fully open source the MTO code on GitHub in the coming weeks. As your tests start to grow in number, consider using MTO to keep test execution times in check—quicker feedback to developers is key to higher productivity.