testing and automating accessibility

Accessibility Audits with Lighthouse

Lighthouse is the most widely deployed accessibility auditing engine in the frontend toolchain—it ships in Chrome DevTools, runs from the command line, and embeds inside CI pipelines through Lighthouse CI. Yet its single most visible output, the green Accessibility score, is also its most misunderstood. A 100 does not mean your application is accessible; it means the subset of WCAG checks that can be automated all passed. This guide, part of Testing & Automating Accessibility, explains exactly what the Lighthouse Accessibility category measures (it runs axe-core under the hood), how to execute audits across DevTools, CLI, and programmatic Node, how to read the audit list and weighted score correctly, which checks Lighthouse silently skips, and how to enforce regression budgets with Lighthouse CI (lhci)—including the SSR and hydration gotchas specific to Next.js and Nuxt.

Mapped WCAG 2.1/2.2 Success Criteria:

  • 1.4.3 Contrast (Minimum) – Lighthouse flags low-contrast text/background pairs as a high-weight automated audit.
  • 4.1.2 Name, Role, Value – Detects missing accessible names on buttons, links, and form controls.
  • 1.3.1 Info and Relationships – Validates table headers, list structure, and ARIA role nesting.
  • 2.4.1 Bypass Blocks – Checks for landmark structure that automated heuristics can partially infer.

Core Considerations:

  • The Accessibility category is powered by axe-core, so it inherits axe's coverage and its blind spots.
  • The score is a weighted pass/fail aggregate of automated audits only—manual audits never affect the number.
  • Keyboard operability, focus order, and screen-reader meaning are outside what the score can verify.
  • In SSR frameworks you must audit the rendered, hydrated page, not the static shell, or results are misleading.

The Lighthouse Accessibility Category (Powered by axe-core)

The Accessibility category is one of five Lighthouse categories (alongside Performance, Best Practices, SEO, and PWA). Critically, it does not implement its own rule engine—it delegates to a bundled build of Deque's axe-core, the same engine behind the axe DevTools extension and the automated accessibility testing workflow. Each Lighthouse accessibility audit maps to one or more axe rules: the color-contrast audit runs axe's color-contrast rule, button-name runs button-name, and so on.

This delegation has two practical consequences. First, Lighthouse's accessibility coverage is a snapshot of whatever axe-core version is bundled into that Lighthouse release—so upgrading Lighthouse can surface new audits or refine existing ones. Second, because axe-core is intentionally conservative (it only reports issues it can prove deterministically, to keep false positives near zero), Lighthouse inherits that conservatism. The engine would rather stay silent than wrongly fail you. That design choice is exactly why a perfect score is necessary but never sufficient evidence of compliance.

Understanding this lineage reframes the score: you are not running "Google's accessibility opinion," you are running axe-core with a Lighthouse-shaped report on top. Everything axe can catch, Lighthouse catches; everything axe cannot catch, Lighthouse cannot either.

# Inspect which axe-core version your Lighthouse build bundles
npx lighthouse --version
npm ls lighthouse 2>/dev/null   # if installed locally
# axe rules are documented at github.com/dequelabs/axe-core/blob/develop/doc/rule-descriptions.md

Testing Hook: When a Lighthouse audit name looks unfamiliar, search the axe-core rule descriptions for the matching rule ID. The audit's description field in the JSON report links straight to Deque's documentation for the underlying check.


Running Lighthouse: DevTools, CLI, and Programmatic Node

There are three execution surfaces, and choosing the right one depends on whether you are debugging interactively or automating.

Chrome DevTools is the fastest path for ad hoc audits. Open DevTools, select the Lighthouse panel, check only the Accessibility category, choose Navigation mode, and click Analyze page load. DevTools audits the live, rendered DOM—including any client-side hydration that has completed—which makes it the most accurate surface for SPA debugging.

The CLI is the reproducible, scriptable surface. It runs a headless Chrome instance, navigates to a URL, and emits HTML or JSON.

# Audit only the accessibility category and emit a machine-readable JSON report
npx lighthouse https://staging.example.com \
  --only-categories=accessibility \
  --output=json \
  --output-path=./lh-a11y.json \
  --chrome-flags="--headless=new --no-sandbox"

# Extract just the score (0-1) for a quick sanity check
npx lighthouse https://staging.example.com \
  --only-categories=accessibility --output=json --quiet --chrome-flags="--headless=new" \
  | npx -y jq '.categories.accessibility.score'

Programmatic Node gives you full control—custom Chrome launch flags, authenticated sessions, and the ability to interact with the page (open a modal, expand a menu) before auditing dynamic states.

// audit.mjs — run Lighthouse against a hydrated, interacted state
import { launch } from 'chrome-launcher';
import lighthouse from 'lighthouse';

const chrome = await launch({ chromeFlags: ['--headless=new'] });

const runnerResult = await lighthouse('https://staging.example.com', {
  port: chrome.port,
  onlyCategories: ['accessibility'], // skip perf/seo to speed up the run
  output: 'json',
});

const score = runnerResult.lhr.categories.accessibility.score * 100;
console.log(`Accessibility score: ${score}`);

// Enumerate failing audits with their weights for triage
for (const audit of Object.values(runnerResult.lhr.audits)) {
  if (audit.score !== null && audit.score < 1) {
    console.log(`FAIL  ${audit.id}  — ${audit.title}`);
  }
}

await chrome.kill();

Testing Hook: Run the CLI against both your production URL and a local dev build. Diverging scores usually mean an environment-specific issue (a missing lang attribute injected only in production, or a contrast regression from a dev-only theme).


Reading the Audits: Names, Contrast, ARIA, and Labels

The report groups results into passed audits, failed audits, manual audits, and not-applicable audits. The failed audits are where triage begins, and they cluster into recognizable families:

  • Accessible namesbutton-name, link-name, image-alt, input-image-alt, label. These fire when an interactive element exposes no programmatic name, violating 4.1.2 Name, Role, Value. The fix is almost always native text content, an aria-label, or an associated <label>.
  • Contrastcolor-contrast checks text against its computed background for the 1.4.3 Contrast (Minimum) 4.5:1 (or 3:1 for large text) ratio. It reports the exact foreground/background hex pair and the failing ratio.
  • ARIA correctnessaria-valid-attr, aria-required-children, aria-required-parent, aria-roles. These catch malformed ARIA: a role="tab" without a role="tablist" parent, an invalid attribute value, or a non-existent role, all mapping to 1.3.1 Info and Relationships and 4.1.2 Name, Role, Value.
  • Structureheading-order, list, definition-list, duplicate-id-aria, html-has-lang.

Each audit row in the HTML report expands to show the offending DOM nodes with a CSS selector and a snippet, so you can jump straight to the source. In the JSON report the same nodes live under audits[id].details.items.

# List every failing audit and the count of offending nodes, sorted for triage
npx lighthouse https://staging.example.com \
  --only-categories=accessibility --output=json --quiet --chrome-flags="--headless=new" \
  | npx -y jq -r '
    .audits | to_entries[]
    | select(.value.score != null and .value.score < 1)
    | "\(.value.title)  (\(.value.details.items | length // 0) nodes)"'

Testing Hook: Don't stop at the audit title—open the node list. A single color-contrast failure can represent one stray badge or two hundred body-text instances, and the node count tells you which.


The Weighted Score: Why 100 ≠ Accessible

The Accessibility score is not a percentage of WCAG criteria met. It is a weighted average where each automated audit carries a weight, audits pass (1) or fail (0) as a binary, and the score is sum(weight × passOrFail) / sum(weight), scaled to 0–100. Higher-impact audits carry heavier weights: color-contrast, button-name, image-alt, and link-name are weighted heavily, while narrower checks weigh less.

Three properties follow directly, and each one is a reason the number lies if read naively:

  1. Audits are binary. One failing <img> fails the entire image-alt audit regardless of how many images pass. The score drops by the full audit weight, not proportionally.
  2. Weights are uneven. Fixing one heavy audit can move the score more than fixing five light ones, which is useful for triage but means the number doesn't track effort linearly.
  3. Only automated audits count. Manual audits and not-applicable audits are excluded from the denominator entirely.

The diagram below shows what the score actually encloses—and the larger surface of WCAG conformance that lives outside it.

What the Lighthouse accessibility score measures versus what it cannot An outer region labeled WCAG conformance contains an inner box labeled automated audits, which is the only part the Lighthouse weighted score covers. Outside the inner box sit manual checks: keyboard operability, focus order, and screen-reader meaning. WCAG conformance (the real target) Automated audits — what the score covers • color-contrast (1.4.3) • button-name / link-name (4.1.2) • image-alt, label, aria-* • heading-order, list, html-has-lang weighted score = Σ(weight × pass) / Σ(weight) binary per audit · 0–100 · automated only Manual checks (score blind) → keyboard operability → logical focus order → screen-reader meaning → live-region timing → visible focus indicators

A 100 confirms that no automated audit found a provable defect. It says nothing about whether a keyboard user can reach the menu, whether the focus order matches the visual order, or whether your live region actually announced. For a deeper breakdown of weighting and triage, see Interpreting Lighthouse Accessibility Scores.

Testing Hook: Read categories.accessibility.auditRefs[].weight in the JSON report to see the exact weight of every audit in your Lighthouse version, then sort your failures by weight to prioritize the highest-impact fixes first.


The Manual Checks Lighthouse Skips

Lighthouse explicitly lists a set of manual audits—"Additional items to manually check"—precisely because it knows it cannot verify them. These are not optional extras; they are where most real accessibility failures hide, and they map to success criteria no static engine can confirm:

  • Keyboard operability (2.1.1 Keyboard, 2.1.2 No Keyboard Trap) — Lighthouse cannot tab through your UI. It can't know whether a custom dropdown opens on Enter or whether a modal traps focus.
  • Logical focus order (2.4.3 Focus Order) — DOM order and tabindex are inspectable, but whether the resulting sequence is meaningful is a human judgment.
  • Screen-reader meaning (1.3.1, 4.1.2) — an element can have a technically valid accessible name that is still useless ("Click here", "Button"). The string passes; the meaning fails.
  • Live-region timing (4.1.3 Status Messages) — Lighthouse sees that an aria-live region exists but cannot confirm it announced at the right moment, or at all.
  • Visible focus indicators (2.4.7 Focus Visible) — a computed style can be read, but contrast and visibility of the focus ring under real interaction need an eye.

This is the boundary the SVG above draws. To cover it, pair Lighthouse with manual passes (keyboard-only navigation, NVDA/VoiceOver), and automate the interaction-heavy layer with end-to-end accessibility testing in Playwright, which can drive the keyboard and assert focus across real user flows.

Testing Hook: Treat the manual audit list as a checklist, not a footnote. Walk every flow with the keyboard only and a screen reader before you trust any green score.


Lighthouse CI (lhci): From Audit to Assertion

Running Lighthouse by hand catches today's problems; it does nothing for tomorrow's regression. Lighthouse CI (@lhci/cli) turns the audit into an enforceable gate. It collects runs, asserts thresholds against the results, and can upload reports to temporary public storage or a self-hosted server. The assertion layer is the heart of it: you declare a minimum accessibility score and, optionally, promote individual audits to hard errors.

// lighthouserc.js — minimal accessibility-gating config
module.exports = {
  ci: {
    collect: {
      // Audit the running app; start your server before lhci, or use startServerCommand
      url: ['http://localhost:3000/'],
      numberOfRuns: 3, // median across runs damps flaky contrast/timing results
    },
    assert: {
      assertions: {
        // Fail CI if the weighted accessibility category drops below 0.95
        'categories:accessibility': ['error', { minScore: 0.95 }],
        // Promote individual high-value audits to hard failures regardless of score
        'color-contrast': 'error',
        'button-name': 'error',
        'image-alt': 'error',
      },
    },
    upload: { target: 'temporary-public-storage' },
  },
};

This is an introduction; the full configuration of assertions, per-audit allowlists for known debt, and wiring lhci autorun into a pull-request gate is covered in Setting Lighthouse CI Accessibility Budgets. For the broader strategy of failing builds on accessibility regressions across tools, see gating accessibility in CI/CD pipelines.

# .github/workflows/lighthouse-a11y.yml — run lhci on every pull request
name: Lighthouse Accessibility
on: [pull_request]
jobs:
  lhci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm ci
      - run: npm run build
      - name: Run Lighthouse CI
        uses: treosh/lighthouse-ci-action@v12
        with:
          configPath: ./lighthouserc.js
          uploadArtifacts: true   # attach HTML reports to the run for review
          temporaryPublicStorage: true

Testing Hook: Set numberOfRuns to at least 3. Contrast and ARIA audits are deterministic, but layout-dependent and timing-dependent checks can flake; the median run is far more stable than a single pass.


SSR Considerations: Audit the Rendered, Hydrated Page

For Next.js and Nuxt apps the single most common cause of misleading Lighthouse results is auditing the wrong DOM state. Server-rendered frameworks ship an initial HTML payload, then hydrate it on the client—attaching event handlers, mounting client-only components, and in some cases injecting ARIA attributes or accessible names that exist only after hydration.

If Lighthouse audits before hydration settles, you get false positives: a button-name failure on a control whose label is injected by a client effect, or a missing live region that mounts only after JavaScript runs. Conversely, a client-only error (a modal that traps focus incorrectly) is invisible to a navigation-mode audit that never opens the modal.

Two rules keep SSR audits honest:

  1. Audit the deployed/built artifact, not the dev server. Dev builds carry extra markup, source maps, and warning banners. Run lhci against a production build (next build && next start, or nuxt build && node .output/server/index.mjs).
  2. Let the page settle, and audit interactive states programmatically. Use the Node API to wait for hydration and to drive the UI into the states (open dialog, expanded menu) you actually care about before calling Lighthouse.
// Audit a hydrated Next.js/Nuxt route after a deterministic settle signal
import { launch } from 'chrome-launcher';
import lighthouse from 'lighthouse';

const chrome = await launch({ chromeFlags: ['--headless=new'] });
const result = await lighthouse('http://localhost:3000/dashboard', {
  port: chrome.port,
  onlyCategories: ['accessibility'],
  // 'desktop' formFactor avoids mobile-emulation contrast quirks on hover-only UI
  formFactor: 'desktop',
  screenEmulation: { disabled: true },
});
console.log(result.lhr.categories.accessibility.score * 100);
await chrome.kill();

Testing Hook: Compare a navigation-mode audit of the static URL against a programmatic audit of the hydrated, interacted page. If the scores differ, your accessibility tree changes during hydration—document which audits flip and pin the audit to the hydrated state.


Key Takeaways

  • The Lighthouse Accessibility category runs axe-core; its coverage and its silence both come from that engine.
  • The score is a weighted, binary, automated-only aggregate—useful for triage, never proof of conformance.
  • Lighthouse explicitly lists manual audits (keyboard, focus order, screen-reader meaning) because it cannot verify them.
  • Use Lighthouse CI assertions to convert a one-off audit into a regression gate on every pull request.
  • In SSR apps, audit the built, hydrated, interacted page or your results describe a DOM your users never see.

Frequently Asked Questions

Does a Lighthouse accessibility score of 100 mean my site is WCAG compliant? No. A 100 means every automated audit that Lighthouse runs passed. Those audits cover a subset of WCAG—roughly the checks axe-core can verify deterministically. Keyboard operability, logical focus order, screen-reader meaning, and live-region timing are listed as manual audits precisely because the engine cannot confirm them. Treat 100 as a clean automated baseline, then do manual keyboard and screen-reader testing for actual compliance.

Why does Lighthouse give different accessibility scores on the same page? Most variance comes from auditing different DOM states or from timing-sensitive checks. A score taken before client-side hydration completes can differ from one taken after, and layout-dependent audits can occasionally flake. Run multiple passes (Lighthouse CI's numberOfRuns with the median), audit the production build rather than the dev server, and ensure the page has fully hydrated before the audit begins.

How is the Lighthouse accessibility score actually calculated? It is a weighted average of automated audits, each scored as a binary pass or fail. The formula is the sum of weight × pass divided by the sum of all weights, scaled to 0–100. Heavily weighted audits like color-contrast and button-name move the number more than narrow ones, and a single failing node fails the whole audit. Manual and not-applicable audits are excluded entirely. See Interpreting Lighthouse Accessibility Scores for the full breakdown.

Can Lighthouse test keyboard navigation and focus management? No. Lighthouse cannot tab through your interface or open interactive components, so it cannot verify keyboard operability, focus traps, or focus order. It can inspect static tabindex and DOM order, but it lists these concerns as manual audits. Cover them with manual keyboard passes and automated interaction tests using Playwright.

How should I run Lighthouse on a Next.js or Nuxt app? Audit the production build, not the dev server, and make sure the page is fully hydrated before the audit runs. For client-only UI (modals, menus), use the programmatic Node API to drive the component into its open state before calling Lighthouse, so the audit reflects the accessibility tree your users actually experience after hydration rather than the static server shell.

Is Lighthouse the same as the axe DevTools extension? They share the same engine—axe-core—but differ in packaging and depth. Lighthouse wraps axe in a weighted score and a curated audit list aimed at a quick pass/fail signal, while the standalone axe-core tooling exposes the full rule set, richer node details, and configuration for specific WCAG tag sets. Use Lighthouse for the score and CI gate; use axe directly for granular debugging.