AI Processing
| Key | Value |
|---|---|
| Status | Active |
| Owner | QA Automation |
| Updated | 2026-03-26 |
| Scope | Healing, incident intelligence, recovery logic, and AI-assisted investigation workflows |
AI in PW-Tests is not one monolithic feature. It shows up in several practical places: structured failure analysis, selector healing, incident memory, recovery detection, historical confidence, and visual review. The point is not to make every decision automatically. The point is to reduce repeated manual triage and make the human operator start from context instead of from zero.
What “AI Processing” Means In This Repo
| Capability | What It Does |
|---|---|
| healing | suggests or applies safe fixes for common test breakages |
| incident matching | checks whether a failure already matches a known root cause |
| historical priors | uses past confirmed recoveries to inform current confidence |
| cause assessment | blends incident evidence and history into a better verdict |
| recovery workflows | notices when a failure stops repeating and can close the loop in Slack |
| visual fact-checking | reviews screenshots beyond raw pixel comparison for selected flows |
Why This Exists
Without these layers, the team ends up doing the same work over and over:
- reading the same timeout stack trace again
- rediscovering the same site redesign issue every week
- treating an already-fixed failure as a fresh regression
- failing to explain whether a noisy run is worth action right now
The AI-related workflows try to shrink that repeated work.
Main Building Blocks
| Building Block | Current Role |
|---|---|
| EventLogger | creates structured evidence for downstream analysis |
| fix database | stores known fixes and healing outcomes |
| incident store | tracks known recurring failures and their root causes |
| failure history | records confirmed recoveries and recurring patterns |
| incident matcher | checks whether a current failure resembles a known incident |
| failure prior service | adds historical weighting based on past recoveries |
| cause assessor | blends signals into a more useful verdict |
| Slack thread tracker | keeps the recovery lifecycle attached to the original thread |
Healing Workflows
The healing commands are still useful, but the repo has moved beyond the old “just rewrite the selector” model.
| Command | Best Used For |
|---|---|
npm run heal | general analysis |
npm run heal:claude | interactive Claude-assisted workflow |
npm run heal:interactive | manual operator-led healing |
npm run heal:ai | AI-assisted analysis path |
npm run heal:apply | applying a proposed fix locally |
npm run heal:dry | previewing changes |
npm run heal:mr | packaging a fix for review |
Incident Intelligence
The incident model is one of the biggest upgrades in the system. It changes the question from “what is the error string?” to “have we seen this failure shape before, and what did it turn out to be?”
| Layer | What It Adds |
|---|---|
| incident store | durable memory of known failures |
| matcher | deterministic lookup against current failures |
| priors | historical weighting from confirmed recoveries |
| cause assessor | human-friendly verdict language |
This is what makes labels like Post-fix, watching or Infra/CI suspicion possible without pure guesswork.
%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#4a90d9', 'primaryTextColor': '#fff', 'primaryBorderColor': '#2c6fad', 'lineColor': '#555', 'fontFamily': 'sans-serif'}}}%%
flowchart TD
FAIL["Test failure\n(category + selector + URL)"] --> FP["Compute fingerprint"]
FP --> INC["Check incident store\ndata/failure-incidents.json"]
FP --> HIST["Check failure history\nconfirmed recoveries"]
INC --> CA["Cause assessor\nblend signals"]
HIST --> CA
CA --> VERDICT["Verdict label\nConfirmed regression\nPost-fix watching\nLikely flaky\nInfra/CI suspicion\nNeeds confirmation"]
VERDICT --> SLACK["Slack alert\nAssessment / Why / Next"]The diagram shows how a failure fingerprint is matched against the incident store and recovery history before a human-readable verdict is produced for the Slack alert.
%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#4a90d9', 'primaryTextColor': '#fff', 'primaryBorderColor': '#2c6fad', 'lineColor': '#555', 'fontFamily': 'sans-serif'}}}%%
flowchart TD
FAIL["Test failure logged\nto JSONL"] --> CLASS["Classify category\nSELECTOR_NOT_IN_DOM\nTIMEOUT_ELEMENT etc"]
CLASS --> AUTO{"Auto-fixable?"}
AUTO -- No --> SLACK["Slack alert\nmanual review"]
AUTO -- Yes --> HEAL["Healer runs\nfinds replacement selector\nor adjusts timeout"]
HEAL --> APPLY["Fix applied\nlocally or MR"]
APPLY --> VERIFY["Re-run test\nverify fix"]
VERIFY --> DB["Fix saved\ndata/fixes.json"]
DB --> LEARN["Improves future\nconfidence scoring"]The diagram shows the healing decision path from failure classification through auto-fix application to the fix database that feeds future confidence scoring.
Recovery Workflows
Recovery replies are the other half of failure alerts. Posting the original failure is only half the job. When a failure disappears after a fix, the system can now track that lifecycle and reply in the same Slack thread.
| Recovery Capability | Why It Matters |
|---|---|
| thread tracking | keeps follow-up tied to the original context |
| consecutive-pass logic | avoids declaring victory on one lucky pass |
| recovery posting | helps operators close threads instead of leaving silent dead ends |
| history recording | improves future confidence scoring |
Main commands:
| Command | Purpose |
|---|---|
npm run slack:reply-resolved | preview recovery replies |
npm run slack:reply-resolved:send | send recovery replies |
npm run incidents:record-recovery | append confirmed recovery history |
Visual Fact-Checker
Visual fact-checking is separate from the structural screenshot suite. It uses screenshots and AI review to answer questions that raw pixel comparison cannot answer well on its own. For full operational detail, see Visual Fact-Checker.
| Use Case | Why It Helps |
|---|---|
| failure triage | explains whether a screenshot looks meaningfully broken |
| post-deploy review | adds human-like interpretation to selected artifacts |
| mobile and PDT review | focuses attention on the failures most worth reading |
Main commands:
| Command | Purpose |
|---|---|
npm run factcheck | standard fact-check workflow |
npm run factcheck:failures | focus on failed runs |
npm run factcheck:ci | CI-oriented fact-check run |
npm run factcheck:slack | send fact-check output to Slack |
What AI Processing Does Not Replace
AI is useful here, but it is not a substitute for:
- understanding the site surface
- checking live DOM when selectors move
- reading traces on tricky interaction failures
- deciding whether a redesign should update a test or the product
It is best viewed as triage acceleration and memory, not as an infallible operator.
When a scheduled run fails, this order usually works best:
- read the Slack summary for the human verdict
- inspect artifacts or logs for the failing test
- check whether the incident store already knows the pattern
- decide whether this is regression, flake, or infra noise
- use healing only when the failure is actually fixable in test code
Related Pages
| Need | Go To |
|---|---|
| failure routing and labels | Failure Categories |
| run alerts and report flows | Reporting |
| logs and structured evidence | Logging System |
| command list | CLI Reference |
| AI screenshot review detail | Visual Fact-Checker |
| dashboards and telemetry | Observability |