AI Processing

Key	Value
Status	Active
Owner	QA Automation
Updated	2026-03-26
Scope	Healing, incident intelligence, recovery logic, and AI-assisted investigation workflows

AI in PW-Tests is not one monolithic feature. It shows up in several practical places: structured failure analysis, selector healing, incident memory, recovery detection, historical confidence, and visual review. The point is not to make every decision automatically. The point is to reduce repeated manual triage and make the human operator start from context instead of from zero.

What “AI Processing” Means In This Repo

Capability	What It Does
healing	suggests or applies safe fixes for common test breakages
incident matching	checks whether a failure already matches a known root cause
historical priors	uses past confirmed recoveries to inform current confidence
cause assessment	blends incident evidence and history into a better verdict
recovery workflows	notices when a failure stops repeating and can close the loop in Slack
visual fact-checking	reviews screenshots beyond raw pixel comparison for selected flows

Why This Exists

Without these layers, the team ends up doing the same work over and over:

reading the same timeout stack trace again
rediscovering the same site redesign issue every week
treating an already-fixed failure as a fresh regression
failing to explain whether a noisy run is worth action right now

The AI-related workflows try to shrink that repeated work.

Main Building Blocks

Building Block	Current Role
EventLogger	creates structured evidence for downstream analysis
fix database	stores known fixes and healing outcomes
incident store	tracks known recurring failures and their root causes
failure history	records confirmed recoveries and recurring patterns
incident matcher	checks whether a current failure resembles a known incident
failure prior service	adds historical weighting based on past recoveries
cause assessor	blends signals into a more useful verdict
Slack thread tracker	keeps the recovery lifecycle attached to the original thread

Healing Workflows

The healing commands are still useful, but the repo has moved beyond the old “just rewrite the selector” model.

Command	Best Used For
`npm run heal`	general analysis
`npm run heal:claude`	interactive Claude-assisted workflow
`npm run heal:interactive`	manual operator-led healing
`npm run heal:ai`	AI-assisted analysis path
`npm run heal:apply`	applying a proposed fix locally
`npm run heal:dry`	previewing changes
`npm run heal:mr`	packaging a fix for review

Incident Intelligence

The incident model is one of the biggest upgrades in the system. It changes the question from “what is the error string?” to “have we seen this failure shape before, and what did it turn out to be?”

Layer	What It Adds
incident store	durable memory of known failures
matcher	deterministic lookup against current failures
priors	historical weighting from confirmed recoveries
cause assessor	human-friendly verdict language

This is what makes labels like Post-fix, watching or Infra/CI suspicion possible without pure guesswork.

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#4a90d9', 'primaryTextColor': '#fff', 'primaryBorderColor': '#2c6fad', 'lineColor': '#555', 'fontFamily': 'sans-serif'}}}%%
flowchart TD
    FAIL["Test failure\n(category + selector + URL)"] --> FP["Compute fingerprint"]
    FP --> INC["Check incident store\ndata/failure-incidents.json"]
    FP --> HIST["Check failure history\nconfirmed recoveries"]
    INC --> CA["Cause assessor\nblend signals"]
    HIST --> CA
    CA --> VERDICT["Verdict label\nConfirmed regression\nPost-fix watching\nLikely flaky\nInfra/CI suspicion\nNeeds confirmation"]
    VERDICT --> SLACK["Slack alert\nAssessment / Why / Next"]

The diagram shows how a failure fingerprint is matched against the incident store and recovery history before a human-readable verdict is produced for the Slack alert.

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#4a90d9', 'primaryTextColor': '#fff', 'primaryBorderColor': '#2c6fad', 'lineColor': '#555', 'fontFamily': 'sans-serif'}}}%%
flowchart TD
    FAIL["Test failure logged\nto JSONL"] --> CLASS["Classify category\nSELECTOR_NOT_IN_DOM\nTIMEOUT_ELEMENT etc"]
    CLASS --> AUTO{"Auto-fixable?"}
    AUTO -- No --> SLACK["Slack alert\nmanual review"]
    AUTO -- Yes --> HEAL["Healer runs\nfinds replacement selector\nor adjusts timeout"]
    HEAL --> APPLY["Fix applied\nlocally or MR"]
    APPLY --> VERIFY["Re-run test\nverify fix"]
    VERIFY --> DB["Fix saved\ndata/fixes.json"]
    DB --> LEARN["Improves future\nconfidence scoring"]

The diagram shows the healing decision path from failure classification through auto-fix application to the fix database that feeds future confidence scoring.

Recovery Workflows

Recovery replies are the other half of failure alerts. Posting the original failure is only half the job. When a failure disappears after a fix, the system can now track that lifecycle and reply in the same Slack thread.

Recovery Capability	Why It Matters
thread tracking	keeps follow-up tied to the original context
consecutive-pass logic	avoids declaring victory on one lucky pass
recovery posting	helps operators close threads instead of leaving silent dead ends
history recording	improves future confidence scoring

Main commands:

Command	Purpose
`npm run slack:reply-resolved`	preview recovery replies
`npm run slack:reply-resolved:send`	send recovery replies
`npm run incidents:record-recovery`	append confirmed recovery history

Visual Fact-Checker

Visual fact-checking is separate from the structural screenshot suite. It uses screenshots and AI review to answer questions that raw pixel comparison cannot answer well on its own. For full operational detail, see Visual Fact-Checker.

Use Case	Why It Helps
failure triage	explains whether a screenshot looks meaningfully broken
post-deploy review	adds human-like interpretation to selected artifacts
mobile and PDT review	focuses attention on the failures most worth reading

Main commands:

Command	Purpose
`npm run factcheck`	standard fact-check workflow
`npm run factcheck:failures`	focus on failed runs
`npm run factcheck:ci`	CI-oriented fact-check run
`npm run factcheck:slack`	send fact-check output to Slack

What AI Processing Does Not Replace

AI is useful here, but it is not a substitute for:

understanding the site surface
checking live DOM when selectors move
reading traces on tricky interaction failures
deciding whether a redesign should update a test or the product

It is best viewed as triage acceleration and memory, not as an infallible operator.

When a scheduled run fails, this order usually works best:

read the Slack summary for the human verdict
inspect artifacts or logs for the failing test
check whether the incident store already knows the pattern
decide whether this is regression, flake, or infra noise
use healing only when the failure is actually fixable in test code

Need	Go To
failure routing and labels	Failure Categories
run alerts and report flows	Reporting
logs and structured evidence	Logging System
command list	CLI Reference
AI screenshot review detail	Visual Fact-Checker
dashboards and telemetry	Observability