Writing Maintainable Integration Tests

Félix GV

Planet-Scale Systems

August 29, 2016

Editor's note: This blog has been updated.

In software development, writing integration tests is sometimes an afterthought. Many people think exclusively in terms of unit tests, and perhaps many more don't think about automated tests at all. Thus, the very idea of writing integration tests that are maintainable, manageable, and scalable may seem foreign to most.

I personally had never felt the limitations of a large codebase of integration tests until working on the Voldemort project. This post is an overview of the pain points of the Voldemort integration tests, as well as our stab at architecting better integration tests in our next project, Venice.

Integration test pain points

The two main problems with Voldemort tests are that they're flaky, meaning that they fail intermittently, and that they’re slow to run. These two characteristics inhibit their regular use and erode the trust placed in them.

We do have infrastructure that runs the tests automatically following every commit to default, and we do inspect the root cause of failing tests, but we have grown accustomed to skipping over some of the more flaky ones, which is obviously undesirable.

A test in each port
In the case of Voldemort, the root cause of both the flakiness and slowness of the test suite is the assignment of ports. As part of our integration tests, we spin up individual Voldemort servers or even entire clusters locally, with each such process binding to a certain port.

Some of the Voldemort tests use hard-coded port numbers, which is convenient in the short term, but ultimately is a bad idea, since any given port may be already busy on the local host, or there may be two tests that are hard-coded to the same port but run concurrently. Attempting to assign different hard-coded ports to each test seems like an exercise in futility.

Finding ports dynamically
Some other Voldemort tests try to be a little bit cleverer and dynamically seek available ports, but the rest of the test suite is not architected in such a way as to fully leverage this capability. There are many variations of this in the code base, but at a high-level, many Voldemort tests follow a sequence of events similar to the following:

Invoke ServerUtils.findFreePorts() to get a list of available ports.
Invoke ServerUtils.getLocalCluster(int[] ports) for a cluster, which listens to the ports gotten in Step 1.
Invoke ServerUtils.startVoldemortCluster() to start the various servers in the cluster gotten in Step 2.

The problem with this approach is that the ports are freed between Steps 1 and 3, causing a race scenario where two tests might concurrently determine that the same ports are free, and thus clash with each other when the time comes to actually use those ports. Step 3 does retry starting up the failed Voldemort server several times before giving up, but if the port is occupied by another test for a long period of time, then it may never be able to finish. Furthermore, since these steps operate on a list of ports, there is a possibility of gridlock where two independent tests prevent each other from succeeding, as well as from moving forward (until the maximum amount of retries is reached, at which point the test can finally fail).

Test dystopia
Tweaking the Voldemort test suite so that it sidesteps the issues described above is certainly doable, but because there are so many tests now, the envisioned cost of such an effort is prohibitive. Therefore, the pragmatic solution to the lack of isolation has been to make the Gradle build run the test suite sequentially.

This lack of parallelism, in turn, means the whole test suite is slow to complete. There is also a lost opportunity to leverage today's multi-core computers, which are perfectly suitable for this type of highly parallel work.

More importantly, even though running tests sequentially minimizes the risk of port clashes, it does not fully solve the issue. For example, other tests may be running simultaneously on a shared CI environment.

A brave new world of tests

The idea of finding ports dynamically is a good one, but it seems like the Voldemort implementation does not embrace it all the way. For this reason, when it came time to design the testing for our new project, Venice, we wanted to see if we could create a fresh solution early on that would solve this problem. We began by asking ourselves, “What do integration tests actually need?”

A change of paradigm
What do integration tests actually need? They need certain processes to interact with. Do tests need to decide which specific ports are going to be used by those services? Probably not. So why, then, would we leave it up to tests to determine these details? Why not instead ask the test framework to provide us with an already spun up process, no matter which port it's running on?

This is essentially what we did in our new project, Venice. Instead of putting the onus on the test writer to configure the processes it needs in such a way that it would not clash with those of other tests, we took that responsibility out of the test writer's hands. Tests can ask the framework to give them a wrapper for a Kafka broker, a Zookeeper server, or anything else they need, and then they can interrogate the wrapper to discover on which host and port the wrapped process runs. The difference between this strategy and the Voldemort strategy is admittedly a bit subtle, but it changes everything.

Retries that work
The main advantage of this approach is that retries can be implemented correctly. In the Voldemort tests, whenever a process failed to bind to its intended port, we would wait and retry, spinning up the process on the same busy port and hoping that it would somehow free up later down the line. In Venice's test framework, when a process fails to bind to its port, instead of waiting, we immediately find another free port and try to spin up the process on that one instead.

Thus, even though finding free ports dynamically is unreliable, we can circumvent that limitation if we have a single piece of code which owns all of the configuration and bootstrapping, end-to-end. But this only works if the test code is not allowed to make any specific assumptions about the port configurations of the processes it needs.

An abstraction that enables performance/isolation trade-offs
Another interesting benefit of this approach is that it enables us to decide where we want to stand in terms of performance versus isolation. For now, we always run with maximum isolation (i.e., every integration test that needs a process gets a fresh new one), but this is obviously more costly because there is a startup time associated with any process we need to startup. If, at some point later down the line, we decide to strike a different compromise, we will have the ability to do so.

For example, we have many tests that need a Kafka broker to interact with. Each test can ask the framework for a Kafka broker, and then interrogate it to find out its host and port. We don't stop there, however; we also ask the framework what Kafka topic name our test should use. The test does not care what specific topic name is used, it only needs a topic name—any will do. This way, if our test suite grows big enough to warrant it, we could decide to spin up just one Kafka broker for all tests (or one broker per N tests), and then hand over a different topic name to each test that needs one, thus still ensuring that tests don't clash with one another. In other words, our tests ask for the resources they need and dynamically deal with what they receive, no matter what the unimportant details—like host, port, or topic name—happen to be. In this way, we maintain maximum flexibility to provide full resource isolation (at the cost of performance) or to share some resources across tests.

With this approach, we can even make the decision based on the environment where the tests are running. For example, a CI post-commit hook might have access to more computing resources than a developer's computer. Likewise, a developer doing iterative work may want to assess very quickly whether tests pass or not, whereas a post-commit hook runs asynchronously and can afford to take more time to execute.

Going forward

As the number of tests we write increases, our test suite takes longer to run, and we need to consider new ways to run our tests more quickly and efficiently. Having a well-designed API for requesting the resources that our tests need will hopefully give us enough flexibility to continue doing so.

In a future post, I might talk about the various knobs we have been tuning in our Gradle and TestNG configurations in order to manage parallelism and maintain a low total runtime for our test suite even as it grows bigger.

Topics: A/B Testing/Experimentation