AI Processing

KeyValue
StatusActive
OwnerQA Automation
Updated2026-03-26
ScopeHealing, incident intelligence, recovery logic, and AI-assisted investigation workflows

AI in PW-Tests is not one monolithic feature. It shows up in several practical places: structured failure analysis, selector healing, incident memory, recovery detection, historical confidence, and visual review. The point is not to make every decision automatically. The point is to reduce repeated manual triage and make the human operator start from context instead of from zero.

What “AI Processing” Means In This Repo

CapabilityWhat It Does
healingsuggests or applies safe fixes for common test breakages
incident matchingchecks whether a failure already matches a known root cause
historical priorsuses past confirmed recoveries to inform current confidence
cause assessmentblends incident evidence and history into a better verdict
recovery workflowsnotices when a failure stops repeating and can close the loop in Slack
visual fact-checkingreviews screenshots beyond raw pixel comparison for selected flows

Why This Exists

Without these layers, the team ends up doing the same work over and over:

  • reading the same timeout stack trace again
  • rediscovering the same site redesign issue every week
  • treating an already-fixed failure as a fresh regression
  • failing to explain whether a noisy run is worth action right now

The AI-related workflows try to shrink that repeated work.

Main Building Blocks

Building BlockCurrent Role
EventLoggercreates structured evidence for downstream analysis
fix databasestores known fixes and healing outcomes
incident storetracks known recurring failures and their root causes
failure historyrecords confirmed recoveries and recurring patterns
incident matcherchecks whether a current failure resembles a known incident
failure prior serviceadds historical weighting based on past recoveries
cause assessorblends signals into a more useful verdict
Slack thread trackerkeeps the recovery lifecycle attached to the original thread

Healing Workflows

The healing commands are still useful, but the repo has moved beyond the old “just rewrite the selector” model.

CommandBest Used For
npm run healgeneral analysis
npm run heal:claudeinteractive Claude-assisted workflow
npm run heal:interactivemanual operator-led healing
npm run heal:aiAI-assisted analysis path
npm run heal:applyapplying a proposed fix locally
npm run heal:drypreviewing changes
npm run heal:mrpackaging a fix for review

Incident Intelligence

The incident model is one of the biggest upgrades in the system. It changes the question from “what is the error string?” to “have we seen this failure shape before, and what did it turn out to be?”

LayerWhat It Adds
incident storedurable memory of known failures
matcherdeterministic lookup against current failures
priorshistorical weighting from confirmed recoveries
cause assessorhuman-friendly verdict language

This is what makes labels like Post-fix, watching or Infra/CI suspicion possible without pure guesswork.

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#4a90d9', 'primaryTextColor': '#fff', 'primaryBorderColor': '#2c6fad', 'lineColor': '#555', 'fontFamily': 'sans-serif'}}}%%
flowchart TD
    FAIL["Test failure\n(category + selector + URL)"] --> FP["Compute fingerprint"]
    FP --> INC["Check incident store\ndata/failure-incidents.json"]
    FP --> HIST["Check failure history\nconfirmed recoveries"]
    INC --> CA["Cause assessor\nblend signals"]
    HIST --> CA
    CA --> VERDICT["Verdict label\nConfirmed regression\nPost-fix watching\nLikely flaky\nInfra/CI suspicion\nNeeds confirmation"]
    VERDICT --> SLACK["Slack alert\nAssessment / Why / Next"]

The diagram shows how a failure fingerprint is matched against the incident store and recovery history before a human-readable verdict is produced for the Slack alert.

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#4a90d9', 'primaryTextColor': '#fff', 'primaryBorderColor': '#2c6fad', 'lineColor': '#555', 'fontFamily': 'sans-serif'}}}%%
flowchart TD
    FAIL["Test failure logged\nto JSONL"] --> CLASS["Classify category\nSELECTOR_NOT_IN_DOM\nTIMEOUT_ELEMENT etc"]
    CLASS --> AUTO{"Auto-fixable?"}
    AUTO -- No --> SLACK["Slack alert\nmanual review"]
    AUTO -- Yes --> HEAL["Healer runs\nfinds replacement selector\nor adjusts timeout"]
    HEAL --> APPLY["Fix applied\nlocally or MR"]
    APPLY --> VERIFY["Re-run test\nverify fix"]
    VERIFY --> DB["Fix saved\ndata/fixes.json"]
    DB --> LEARN["Improves future\nconfidence scoring"]

The diagram shows the healing decision path from failure classification through auto-fix application to the fix database that feeds future confidence scoring.

Recovery Workflows

Recovery replies are the other half of failure alerts. Posting the original failure is only half the job. When a failure disappears after a fix, the system can now track that lifecycle and reply in the same Slack thread.

Recovery CapabilityWhy It Matters
thread trackingkeeps follow-up tied to the original context
consecutive-pass logicavoids declaring victory on one lucky pass
recovery postinghelps operators close threads instead of leaving silent dead ends
history recordingimproves future confidence scoring

Main commands:

CommandPurpose
npm run slack:reply-resolvedpreview recovery replies
npm run slack:reply-resolved:sendsend recovery replies
npm run incidents:record-recoveryappend confirmed recovery history

Visual Fact-Checker

Visual fact-checking is separate from the structural screenshot suite. It uses screenshots and AI review to answer questions that raw pixel comparison cannot answer well on its own. For full operational detail, see Visual Fact-Checker.

Use CaseWhy It Helps
failure triageexplains whether a screenshot looks meaningfully broken
post-deploy reviewadds human-like interpretation to selected artifacts
mobile and PDT reviewfocuses attention on the failures most worth reading

Main commands:

CommandPurpose
npm run factcheckstandard fact-check workflow
npm run factcheck:failuresfocus on failed runs
npm run factcheck:ciCI-oriented fact-check run
npm run factcheck:slacksend fact-check output to Slack

What AI Processing Does Not Replace

AI is useful here, but it is not a substitute for:

  • understanding the site surface
  • checking live DOM when selectors move
  • reading traces on tricky interaction failures
  • deciding whether a redesign should update a test or the product

It is best viewed as triage acceleration and memory, not as an infallible operator.

When a scheduled run fails, this order usually works best:

  1. read the Slack summary for the human verdict
  2. inspect artifacts or logs for the failing test
  3. check whether the incident store already knows the pattern
  4. decide whether this is regression, flake, or infra noise
  5. use healing only when the failure is actually fixable in test code
NeedGo To
failure routing and labelsFailure Categories
run alerts and report flowsReporting
logs and structured evidenceLogging System
command listCLI Reference
AI screenshot review detailVisual Fact-Checker
dashboards and telemetryObservability