testing and automating accessibility

Automated Accessibility Testing with axe-core

axe-core is the accessibility rules engine that powers most of the tooling frontend teams already use: browser DevTools panels, Lighthouse's a11y category, jest-axe, and the Playwright accessibility integrations. Understanding the engine directly—rather than the wrappers around it—lets you configure the right WCAG tags, interpret "incomplete" results correctly, and avoid the trap of treating a green automated run as proof of conformance. This guide sits under Testing & Automating Accessibility and explains how axe-core works, how to wire it into React and other framework dev loops, and—critically—what it cannot detect on its own.

What this guide covers

  • How the rules engine walks the DOM, builds a partial accessibility tree, and classifies results into violations, incomplete, and passes.
  • Running @axe-core/react in development for live, render-time auditing.
  • Using axe DevTools and the headless API in the browser.
  • Configuring rules and WCAG tags (wcag2a, wcag21aa, wcag22aa, best-practice).
  • Handling false positives and "needs review" results without disabling rules wholesale.
  • The categories of barriers automation structurally cannot catch.

Target WCAG 2.2 criteria

  • 4.1.2 Name, Role, Value — axe checks that interactive elements expose a computed accessible name and a valid role.
  • 1.4.3 Contrast (Minimum) — the color-contrast rule computes ratios from resolved styles.
  • 1.3.1 Info and Relationships — structural rules verify labels, headings, list and table semantics.
  • 4.1.1 Parsing — duplicate-id and ARIA-id-reference checks (note 4.1.1 is obsolete in WCAG 2.2, but axe still surfaces the underlying DOM defects).

What axe-core Is and How the Rules Engine Works

axe-core is a JavaScript library that runs inside the page context. When you call axe.run(), it does not statically parse your JSX or templates—it inspects the live, rendered DOM after the framework has committed it. That distinction matters: server-rendered output, hydration mismatches, and conditionally rendered ARIA all show up exactly as the browser sees them.

The engine executes in four phases:

  1. Tree collection — axe walks the DOM from the configured context (default document), flattening shadow roots and <iframe> boundaries it can reach, and skipping nodes that are not rendered.
  2. Accessibility tree resolution — for each candidate node it computes the accessible name, role, and relevant state using the same name-computation algorithm browsers use for 4.1.2 Name, Role, Value.
  3. Rule + check execution — each rule targets a CSS selector, gathers matching nodes, then runs one or more checks (small evaluate functions returning true/false/undefined) against them.
  4. Result classification — checks are aggregated into one of four buckets.

The four result types are the single most misread part of axe output:

  • violations — at least one check failed. This is a real defect with high confidence. Fix these.
  • incomplete ("needs review") — a check returned undefined: the engine could not decide automatically and needs a human. Common with color-contrast over images, gradients, and overlapping elements.
  • passes — checks succeeded. This only means the things axe can test are fine, not that the node is accessible.
  • inapplicable — no nodes matched the rule selector, so it was skipped.
How axe-core processes a page into result buckets A left-to-right pipeline: the rendered DOM is walked, resolved into a partial accessibility tree, rule checks run against matching nodes, and results are classified into violations, incomplete, passes, and inapplicable. Rendered DOM walk Accessibility tree resolve Rule + check run violations (fail) incomplete (review) passes inapplicable

The mental model: a violation is a confident "no," an incomplete is "I can't tell—you look," and a pass is "nothing I'm able to measure is wrong here." Treating incompletes as passes is how broken color contrast and ambiguous labels ship.


Running @axe-core/react in Development

@axe-core/react (the successor to react-axe) runs axe automatically after every React commit, logging violations to the browser console with the offending node and a docs link. It is the tightest feedback loop available—issues appear as you build the component, not in a separate test run.

// src/main.tsx — initialise axe ONLY in development
import React from 'react';
import ReactDOM from 'react-dom/client';
import App from './App';

if (process.env.NODE_ENV !== 'production') {
  // Dynamic import keeps axe out of the production bundle entirely.
  import('@axe-core/react').then(({ default: axe }) => {
    // (React, ReactDOM, debounce-ms). 1000ms debounce avoids
    // re-auditing on every keystroke during rapid re-renders.
    axe(React, ReactDOM, 1000, {
      // Scope to the WCAG tags you actually gate on (see below).
      runOnly: { type: 'tag', values: ['wcag2a', 'wcag21aa', 'wcag22aa'] },
    });
  });
}

ReactDOM.createRoot(document.getElementById('root')!).render(<App />);

Because it audits the committed DOM, @axe-core/react catches defects that only exist after state changes—an open menu missing aria-expanded, a dynamically inserted error that lacks an associated label. For those dynamic-state announcements specifically, pair it with the patterns in dynamic content & state announcements.

For Vue, Svelte, or Angular, use the framework-agnostic API directly in a dev-only effect or route hook:

// dev-audit.ts — works in any framework's client runtime
import axe from 'axe-core';

export async function auditNow(context: Element | Document = document) {
  const results = await axe.run(context, {
    runOnly: { type: 'tag', values: ['wcag22aa'] },
  });
  // Group so incompletes are visible, not buried under passes.
  if (results.violations.length) console.error('a11y violations', results.violations);
  if (results.incomplete.length) console.warn('a11y needs review', results.incomplete);
}

How to verify: open the console after each interaction. A correctly wired setup logs nothing for clean components and a grouped table for defects. Manually trigger the dynamic states (open the modal, submit the invalid form) and confirm axe re-runs and reports against the current DOM, not the initial render.


axe DevTools and Browser Usage

The axe DevTools browser extension is the fastest way to audit a running page without touching code. It runs the same engine, so its findings are identical to your programmatic runs—useful for confirming that a CI failure reproduces in a real browser, and for the "Inspect" affordance that highlights the failing node and shows the computed accessible name.

For scripted or one-off checks, run the engine straight from the console on any page that loads axe, or inject it:

// Paste into the browser console after loading axe-core via a snippet/bookmarklet
axe.run({ runOnly: { type: 'tag', values: ['wcag22aa'] } })
  .then((r) => {
    console.table(r.violations.map((v) => ({ id: v.id, impact: v.impact, nodes: v.nodes.length })));
    console.log('needs review:', r.incomplete.map((i) => i.id));
  });

The impact field (minor, moderate, serious, critical) lets you triage. Treat serious/critical 4.1.2 Name, Role, Value and 1.4.3 Contrast (Minimum) failures as release blockers; schedule minor best-practice items.

How to verify: run the extension, then re-run the same tag set via the console snippet on the identical page state—the violation IDs and node counts must match. A mismatch usually means the DOM changed between runs (animation, async data) and you need to freeze that state first.


Configuring Rules and WCAG Tags

axe ships ~90 rules, each tagged. The runOnly option restricts execution to the tags you care about; this is how you keep results aligned with the conformance level you actually claim.

TagMaps to
wcag2aWCAG 2.0/2.1/2.2 Level A rules
wcag2aaWCAG 2.0 Level AA
wcag21aaNew AA criteria added in WCAG 2.1 (e.g. reflow, non-text contrast)
wcag22aaNew AA criteria added in WCAG 2.2 (e.g. 2.4.11 Focus Not Obscured, 2.5.8 Target Size)
best-practiceStrong recommendations not tied to a specific SC

To target a typical "Level AA, WCAG 2.2" gate, combine the cumulative A and AA tags:

const axeConfig = {
  runOnly: {
    type: 'tag',
    // Cumulative: A + AA across 2.0, 2.1 and 2.2.
    values: ['wcag2a', 'wcag2aa', 'wcag21aa', 'wcag22aa'],
  },
  rules: {
    // Disable a single rule globally only with a documented reason.
    'region': { enabled: false }, // landmark requirement handled by app shell
  },
};

You can also configure per-rule behaviour with axe.configure()—enable best-practice rules selectively, raise a rule's tags, or register project-specific checks. That is its own topic; see writing custom axe-core rules.

How to verify: log results.testEngine and the active rule set; assert in a test that the tag list matches your stated conformance target so a stray runOnly change can't silently narrow coverage. Manually spot-check one criterion per tag (e.g. tab a 2.2 target-size element) to confirm the gate behaves as configured.


Handling False Positives and "Incomplete" Results

True false positives in axe are rare—most "false positives" are real defects the author disagrees with, or incompletes mistaken for failures. Work through them in order:

  1. Read the incomplete entry first. axe tells you why it couldn't decide. The color-contrast rule returns incomplete when the background is an image, a gradient, or a semi-transparent overlay it can't sample. This is expected behaviour, covered in depth in catching color contrast failures with axe-core.
  2. Reproduce in a frozen DOM state. Async content and animations cause flaky incompletes. Snapshot the state, then re-run.
  3. Prefer narrowing context over disabling rules. Exclude a known third-party widget by selector rather than turning a rule off everywhere:
await axe.run(
  { exclude: [['#third-party-chat-widget']] }, // not our DOM; audited separately
  axeConfig,
);
  1. Suppress at the node level, with a paper trail. If a finding is genuinely a false positive, record it as a reviewed exception in your test harness rather than deleting the rule—so the suppression is visible in review and revisited when the dependency updates.

How to verify: every suppressed or excluded item should have a comment naming the reason and an owner. Manually screen-reader-test anything you excluded—exclusion means "tested elsewhere," not "ignored."


What axe Cannot Detect

This is the section that prevents false confidence. axe verifies machine-checkable properties; it cannot judge meaning. Automation reliably catches an estimated 30–40% of WCAG issues. The rest require human testing.

  • Meaningful alternative text. axe flags a missing alt, but alt="image" on a product photo passes every check while telling a screen-reader user nothing (1.1.1 Non-text Content).
  • Logical focus order. axe confirms elements are focusable; it cannot tell whether tabbing through them follows a sensible sequence (2.4.3 Focus Order).
  • Sensible, unambiguous labels. A button named "Click here" or three different "Edit" buttons with no distinguishing context pass 4.1.2 Name, Role, Value but fail real users.
  • Whether an ARIA pattern actually behaves correctly. axe checks attribute validity, not interaction—a role="dialog" with no focus trap and no Escape handling passes structurally.
  • Reading order, error recovery, and cognitive load. Entirely out of scope for any static rules engine.

The takeaway: a clean axe run is a necessary, not sufficient condition. Combine it with keyboard-only walkthroughs and screen-reader testing—see screen reader compatibility testing.


Common Pitfalls

  1. Treating "incomplete" as "pass." Needs-review results are the engine asking for help. Surface them in your reporting; never hide them behind a violations-only assertion.
  2. Auditing the initial render only. Most framework defects appear in dynamic states. Audit after interactions, not just on mount.
  3. Running with the default tag set and claiming a specific conformance level. If you gate on "WCAG 2.2 AA," your runOnly must include wcag22aa—the defaults won't.
  4. Disabling a rule globally to clear one node. Use exclude context or a documented node-level exception so coverage stays intact elsewhere.
  5. Shipping axe in the production bundle. Always guard @axe-core/react behind a NODE_ENV check and a dynamic import.
  6. Equating a green run with accessibility. Pair every automated pass with manual keyboard and screen-reader verification before you make conformance claims.

Frequently Asked Questions

Why does axe report "needs review" (incomplete) instead of a pass or fail? Some checks can't be fully automated. When a check's evaluate function returns undefined—for example, color-contrast over a background image or gradient it can't sample—axe records the node as incomplete and asks a human to verify. Incompletes are not passes; treat them as a to-do list for manual review.

Is a clean axe-core run enough to claim WCAG 2.2 AA conformance? No. Automated rules reliably catch roughly 30–40% of WCAG issues—the machine-checkable ones. Meaningful alt text, logical focus order, unambiguous labels, and correct interactive behaviour all require manual keyboard and screen-reader testing. A green run is necessary but not sufficient evidence.

What's the difference between @axe-core/react and jest-axe?@axe-core/react runs in the browser after every React commit and logs to the console for live development feedback. jest-axe runs the same engine inside unit tests against rendered component output and fails the test on violations. Use @axe-core/react while building and jest-axe to gate components in CI; see component testing with jest-axe.

How do I scope axe to a specific WCAG version and level? Pass runOnly: { type: 'tag', values: [...] }. For WCAG 2.2 AA, include the cumulative tags wcag2a, wcag2aa, wcag21aa, and wcag22aa. Omitting wcag22aa silently drops the 2.2-specific criteria like 2.5.8 Target Size.

Can axe-core test framework code directly, or only the rendered output? Only the rendered DOM. axe runs in the page context and inspects what the browser has committed after hydration—it never reads your JSX, templates, or component source. This is why it catches hydration mismatches and conditionally rendered ARIA that static linters miss.

Should I disable a noisy rule to get a green build? Almost never globally. Prefer excluding a specific context (e.g. a third-party widget by selector) or recording a documented, reviewed node-level exception. Disabling a rule everywhere removes coverage from your own components too, and the regression it was catching will eventually ship.