LinkedIn’s approach to automated accessibility (A11y) testing

May 21, 2020

Co-authors: Oliver TseAndrew Lee, Melanie Sumner, Renato Iwashima

Accessibility (A11y) Engineering at LinkedIn aims to streamline accessible product development and maintenance. We think about how tooling can help achieve accessibility success and increase the efficiency of our fellow engineers across the organization. We design tools and the infrastructure necessary for engineers to build products that are accessible, elegant, delightful, and more easily testable.

Our overall aim is to bring together the amazing talent across all product teams at LinkedIn and to facilitate inclusive design through innovative tooling, collaborative projects, and consultations. Just as LinkedIn ensures that everyone is connected to economic opportunity, A11y Engineering ensures that every product and platform is inclusively empowered to do just that.

With this vision in mind, we have strategically embraced accessibility test automation to accelerate our detection of common issues in new features and reduce regressions in existing features. At its core, accessibility test automation involves running a suite of accessibility rules on essential user flows and user interfaces of our applications.

The main strength of automated accessibility testing is its ability to find “low-hanging fruit” caused by oversights at the code level. This frees up engineering time and covers common defects that are often overlooked.

The other key benefit is the ability to guard against regressions. Good accessibility test coverage provides a clear signal for overall a11y health. A sudden introduction of new violations will act as a smoke signal, serving as a leading indicator for the existence of other issues. In an environment characterized by multiple releases per day, where it is not feasible to accompany each new release with manual testing, this becomes an indispensable tool.

Running accessibility test automation rules during continuous integration prevents common accessibility issues from ever reaching LinkedIn members, allowing us to address problems before we ship. While we see automated testing as an essential part of our arsenal to scale accessibility, we fully acknowledge that, depending on who you ask or reference, accessibility test automation will only identify between 20% and 30% of accessibility issues out there (“Automated accessibility checking... can only auto-check about 30% of #a11y issues” and “Manual Accessibility Testing: Why & How").

At LinkedIn, we integrate various open-source and licensed automation frameworks into our continuous integration pipeline, where only if the commit passes accessibility checks will it be merged into the master branch for our web, iOS, and Android applications, including:

  1. Deque’s axe-core for the web
  2. Google’s Toolbox for Accessibility for iOS (GTXiLib)
  3. Google’s Accessibility Test Framework for Android (ATF)

LinkedIn web applications are built using the Ember JavaScript framework. It features an extensive ecosystem of plugins and extensions commonly referred to as Ember Add-ons. Testing is a core tenet of the framework and is built in as a first-class citizen.

Web apps

LinkedIn uses Deque's Axe Core accessibility testing framework for our web apps. Axe is a static analysis engine for websites and other HTML-based user interfaces. It is integrated into our Ember testing infrastructure by the Ember A11y Testing Addon.

There are three types of Ember tests:

  1. Unit tests verify individual pieces of code. They are insufficient to assess a11y but are very quick to execute.
  2. Integration tests verify user interface components. They are examined in isolation, somewhat quick to execute, but lack the fidelity to accurately assess a11y.
  3. Acceptance tests are the ideal point of integration as they most closely resemble what a user would experience. However, they are much more costly in terms of run-time and resourcing.

Our continuous integration and deployment process is 3x3, consisting of at least three daily deployments and three stages of validation: pre-commit, pre-merge, and post-merge.

  • Pre-commit occurs on push-to-remote and includes static linting checks and dependency validation.
  • Pre-merge occurs after a successful push and includes build verification and test suite execution.
  • If successful, post-merge checks queue up associated changes for deployment.

In terms of accessibility, any resulting violations prevent changes from moving onto the next stage. Specifically, invalid a11y ember-template-lint checks block commits, while failed a11y test assertions block merging and are auto-reverted.

So, we have a multi-faceted approach to accessibility engineering—we use linting for static analysis, automated testing for dynamic analysis, and manual testing for the things we can’t automate yet. Each of these approaches has their strengths—linting gives developers feedback right in their IDE, automated testing can more robustly check the rendered code to make sure it provides what screen readers expect, and manual audits ensure usability—and as such, we consider them to all be important parts of the whole.

As engineers run local tests during their day-to-day work, a11y regressions resulting from UI changes are automatically identified by existing a11y assertions. As such, it is imperative that all new a11y assertions are introduced at a clean state, meaning that all identified violations are addressed before submission. Since the entire test suite runs after pushing a code commit, any a11y failures block the associated commit from being merged.

Impact on performance
Commit to Publish (C2P) time is an important metric signaling the overall health of our build and testing infrastructure. Unfortunately, the initial implementation of automated accessibility tests turned out to negatively impact and regress C2P. Tests with added a11y assertions were found to more than double in execution time. Considering that there are over 8,000 acceptance tests, the additional load would equate to substantial degradation of the CI/CD pipeline.

Mitigation strategies
Mitigating negative performance impacts required cross-collaboration with several partner teams. Based on intensive investigatory work, clear guidance and integration, best practices were formulated:

  1. Indiscriminate a11y assertions cause redundancy, as multiple tests end up checking the same view.
  2. To avoid such redundancy, assertions should be deliberate and scoped into specific segments of the screen.
  3. Assertions should cover high-traffic, high-impact, transaction paths.
  4. 100% accessibility test coverage is not the goal. Tests are a signal of overall accessibility health, but not the only signal. They are but a single tool in our overall arsenal.

Further investigation led to improvements from optimized configuration settings and the omission of non-performant, low-impact rules. But, by far, the biggest improvement came from the creation of a dedicated accessibility distributed test job. This job automatically pipes all a11y assertions into a parallel process so that the main test execution isn’t affected, thus preserving C2P time.

iOS and Android apps

We use the same testing approach for iOS and Android that we use for web. However, we run fewer rules and we do not run them using the same library. This is because mobile platforms have their own APIs and different ways to interact with assistive technologies. We do plan to increase the number of accessibility checks as the accessibility capability of both platforms evolves.

iOS
We have 6 rules. We check for whether:

  1. Label is present: ensure that all accessibility elements have a label.
  2. Trait is not in the label: ensure that elements don’t redundantly describe accessibility traits such as “Button” in the label, since these roles are announced by the screen reader automatically.
  3. Label is not redundant: ensure that accessibility labels are unique so that they are distinguishable when using a screen reader.
  4. Traits don't conflict: ensure that incompatible traits such as “Button” and “Link” are not used at the same time.
  5. Touch target size: ensure that all interactive elements have a touch target size of at least 44pt.
  6. Contrast is sufficient: ensure that text contrast is at least 4.5 to 1.

For more information, check Google’s GTXiLib.

Android
We have 10 rules. We check for:

  1. Unsupported item type: ensure that specified accessibility class name (role) is supported by TalkBack.
  2. Clickable span: ensure that ClickableSpan is not being used in a TextView, since it is inaccessible because individual spans cannot be selected independently in a single TextView.
  3. Traversal order: ensure that traversal order specified by the developer doesn’t have any problems such as loops or constraints in the traversal.
  4. Contrast check: ensure that text contrast is at least 4.5 to 1.
  5. Label present: ensure that all accessibility elements have a label.
  6. Duplicate clickable bounds: ensure that clickable/touchable bounds are not overlapping each other.
  7. Duplicate speakable text: ensure that accessibility labels are unique so that they are distinguishable when using a screen reader.
  8. Editable content description: ensure that an editable TextView is not labeled by a contentDescription.
  9. Touch target size: ensure that all interactive elements have the touch target size of at least 48dp.
  10. Link purpose is unclear: ensure that the link purpose is not unclear or nor insufficiently descriptive enough.

For more information, check Google’s Accessibility Testing Framework and Google’s Accessibility Scanner.

Libraries for accessibility checking on iOS and Android
We use internal wrapper libraries for both Android and iOS called android-autocheck and ios-autocheck respectively.

  • diagram-listing-the-libraries-used-for-accessibility-checking-on-mobile

These wrapper libraries use the Accessibility Testing Framework for Android and GTXiLib for iOS.

We use wrapper libraries because we can:

  1. Add custom rules
  2. Override existing rules
  3. Define sets of rules
  4. Define suppression list

Android also provides a linting tool that does some basic accessibility checks. None of the linting checks are considered errors, due to false positives inherent to static analysis checks, but they are still very helpful for engineers to detect possible issues.

Conclusion

To be effective, a11y automated testing must be executed thoughtfully, deliberately, and in conjunction with multiple types of testing—including human manual testing, engineers doing their due diligence, linting, etc.—with each deployed at the right time in the continuous integration development process.

A11y automated testing is forward-looking. When executed early and often during the development process, it is a highly accurate predictor of how accessible our products will be. While it’s true that a11y automated testing is unable to detect issues found only by manual testing, its key benefit is that it is highly efficient. Automated testing can identify issues faster than the amount of time it takes a human manual tester to file a single bug. By exploiting its efficiency, we use it as a litmus test to gain insight into the accessibility of our products.

Overall, a11y automated testing is an essential tool that empowers our engineers to build accessible products for our members in service of our vision to create economic opportunity for everyone, including the more than 1 billion people around the world with disabilities.

Topics