UI Automation: Keep it Functional – and Stable!

Jacek Suliga

Dev Experience @ LinkedIn

January 6, 2016

When we started designing our mobile infrastructure for the new LinkedIn Flagship app last year, a decision was made to build an extensive automation testing suite to complement our manual testing. Our goal was to have high confidence in the stability of every single build coming out of our CI (Continuous Integration) pipeline.

When deciding which iOS UI automation framework to use, we looked at a number of options, including our own in-house software which was being built at the time. During our research, we focused on three aspects that we cared about the most:

Reliability: A failing test should mean one of two things: app logic is broken, or a test is broken (not adjusted after an app logic change). This means that at any given point in time, a test should either consistently fail or consistently pass. Non-deterministic (flaky) behavior is counter-productive and hard to troubleshoot.
Execution speed: We wanted to create an unprecedented, truly comprehensive test suite for our apps that would minimize the amount of time and effort we spent doing manual mobile testing - not just a simple set of "smoke tests”. Because of the size and complexity of LinkedIn Flagship, this involved creating and maintaining thousands of test cases. With our goal to get from code submission to an App Store-ready build in 3 hours, this meant UI automation needed to be really fast.
Authoring effort: We believe that ensuring your application code works properly is your responsibility as an engineer, but writing tests should be as effortless and as fast as possible, to encourage people to write more tests. We wanted to minimize the overhead around writing tests, to allow us to spend time writing quality tests, not learning new languages for tests or installing and learning new tools on their dev machines in order to run the tests.

Cross-platform compatibility was not a priority for us. Even though Android and iOS apps look similar on the surface, the platforms and user experiences are different enough that we couldn’t maintain one set of flexible tests for both without limiting ourselves when writing comprehensive scenario tests.

Skipping the details of other frameworks we considered for another time, I’m going to explain why we chose Keep It Functional (KIF), which was developed and open sourced by Square.

There are several reasons why we chose KIF:

It has a relatively strong and active open source community;
Tests are easy to write, in the same language as the code (ObjC/Swift), using familiar concepts (XCTest and driving automation through accessibility hooks);
No extra infrastructure/tooling is needed to execute the tests in our CI (Continuous Integration) pipeline;
KIF was fast enough for our use cases at the time;
It’s been widely adopted;
And there is no risk of shipping test code within the production builds of the app.

Our engineers embraced KIF and added a couple APIs on top of it, specific to our networking layers to stub out network requests with mocked data. Then, we started writing some really great scenario tests.

All was well, until one day we started noticing non-deterministic behavior:

Engineers would write tests, run and verify them locally, and push changes to our CI;
Our CI would run the tests again on the build machines;
Every now and then, some of the tests would fail.

On top of this, tests would be very slow to run. A couple hundred tests would take about 90 minutes to execute on only one simulated device type (and our goal was to test on various combinations of different iOS versions and device types). We were only at 10% test coverage at that point.

The flakiness in test runs was really painful. Changes unrelated to the failing tests would get marked as 'bad' by our CI. Engineers would get upset - and rightly so. Our goal to have a reliable framework with reliable tests was not met. We were in trouble.

A small team was locked in a room to investigate and address this issue. We took a close look at the failing tests, and dug deeper into the KIF framework code. It was fun to see even our VP of Engineering join us, and hack some code. Here’s what we found:

A lot of KIF APIs that perform UI automation actions rely on specific hard-coded timings. These timings worked well for the majority of the cases and on faster dev machines, but on slower machines they would fail occasionally. For example, when entering text into a text field, the test framework would wait for up to 1 second, and bail out with a failure if the expected text wasn't there within that time. Another instance was when looking for views inside of a scroll view: KIF would scroll it, and wait for 0.3 seconds hoping that scrolling has finished in that time. Again - most of the time this delay would be sufficient, but sometimes it would not, and a test would occasionally fail.
KIF has a number of problematic APIs, namely "waitForAnimationsToFinish" and “waitForTimeInterval” which would wait for a fixed amount of time. Depending on the timings of the action taken in the UI, this might not be a sufficient delay. Plus, these waits extended the run time of our tests - the app would wait even though everything might have been already updated in the UI.

After patching up some of the issues we found, we realized we needed a more comprehensive change in how KIF was waiting on things:

The way KIF would look up views by accessibility identifiers (or labels) was too slow. Walking the view hierarchy over and over again was the main cause of the slow performance of our scenario tests.
Instead of using explicit delays and waits, KIF and the application code should have used notifications (checkpoints) to reliably wait for an event. With notifications, if something happens fast, we don’t wait long. On a slower machine, when the same action takes longer, we would wait longer - without hard coding how long. Of course, for the error cases, we still needed a fixed timeout after which we give up and call it a failure - but this timeout could be quite long without affecting the performance for the positive cases (like 30 seconds, or even a minute).

We found a way to fix both of the above issues with one simple idea: have views post a notification with their accessibility identifier, and state changes. Instead of checking if a view is there, we would wait for a notification to tell us that it’s there. We would wait for a view with a given accessibility identifier (meaning the view is present), and wait for a desired state (visible, tappable). This was quickly solved with the help of ObjC swizzling, and by maintaining a mapping from accessibility identifier to a view reference.

We also fixed scroll view handling by adding a proxy delegate in KIF, so that we would know reliably when scrolling stopped. This meant the authors of the tests needed to reference their cells by index paths explicitly instead of relying on KIF's smart logic to scroll to find the cell, which violated the principle of encapsulation, where the user of an API should not be aware of its implementation. But this tradeoff was worth it - in the context of a test you better know where to find the item you need.

To ensure tests for asynchronous logic are written in a reliable way, we added new "waitForCheckpoint" and "postCheckpoint" APIs – an asynchronous operation would notify the test framework when it's finished. Example use cases:

Asynchronous tasks changing UI state: Imagine your app has a complex computation taking place when a user taps a button, and once the computation has finished, the label on the button changes. Instead of waiting for an arbitrary amount of time and testing if the button has changed, we can notify the test code when the async operation has finished, and then reliably check the button label.
Animations: There are cases where you animate in a view with custom transitions. You want to test for the state (or user taps) at specific intervals – in particular, when the animations have finished. With our new APIs, you can post a checkpoint in the animations completion block to trigger the checks in the test code, which is more reliable and flexible than polling on the view state changes.
Updating table views: After switching to a view that shows previously cached values for a table view, you want to automate a "pull to refresh" logic. The user pulls down on the screen to fetch new data and update the cells accordingly. The cells are already there, and the refresh is asynchronous. How does your test code know when it should check for the expected cell state? If you’re showing a progress spinner for the duration of the refresh, you could wait for this view to disappear. What if you have a background refresh triggered by a timer? With our new APIs, we post the checkpoint after having updated the data source for the refreshed data, just after telling the table view to reload. The test code is waiting for this checkpoint to verify the content of the cells.

The “postCheckpoint” calls needed to be added to application code (and, as a result, somewhat break the wall between main program code and testing code), but we ensured that these calls would get compiled out for production builds.

Example pattern in the app code:

          @IBAction func reloadData(sender: UIButton) {
    // ask the server for new data
    dataManager.refreshData { newData in 
        // update the data source and reload the table
        self.tableItems = newData
        self.tableView.reloadData()
        UITestUtils.postCheckpoint(kDataRefreshedCheckpoint)
    }
}
      

Test code would then use the checkpoint like so:

          // tap to refresh the data
waitForCheckpoint(kDataRefreshedCheckpoint) {
    self.KIFtester().tapViewWithAccessibilityIdentifier(kRefreshDataButtonIdentifier)
}

// Here we know for sure that the table view has been refreshed
// and can reliably query the cells for the expected content

let firstCellPath = NSIndexPath(forRow: 0, inSection: 0)
if let profileHeaderCell = KIFtester().waitForCellAtIndexPath(firstCellPath) as? ProfileHeaderCell {
    XCTAssertEqual(profileHeaderCell.nameLabel.text, "Geralt of Rivia")
} else {
    XCTFail("Invalid first cell")
}
      

With this improved KIF API, we were able to remove all explicit waits and delays, and extend the intended timeouts for more reliable runs. We also disabled animations when running tests to further reduce the execution time of our test suite (note that doing so might not be desirable in all use cases, as disabling animations for view transitions changes asynchronous patterns in the app, which might hide some race conditions). When combined, all these changes fixed reliability issues within the framework, and made the tests run 10 times faster. Our test suite of 300 tests took only nine minutes to run at that time. We have added many more tests since then, which would not have been possible with the slower performing framework.

Here’s a video comparing executions for the same tests before and after the optimizations (captured on a very early build of the new Flagship app):

Until Apple’s new UI automation API has matured enough to be ready for production adoption, we will continue using KIF.

We will be contributing our fixes and improvements back to the open source community soon, along with all the lessons learned. We hope others can further extend and improve KIF, to keep it functional - and stable! Please watch our KIF fork here.

Thanks to the team that worked on improving our test framework: Ashit Gandhi, Keqiu Hu, Frederick Fung, Serguei Kasianov, Kamilah Taylor, Yuichi Sasaki, Ankit Goyal, Raja Sekara, and others.

Topics: Open Source A/B Testing/Experimentation Product Design