Visual Fact-Checker
| Key | Value |
|---|---|
| Status | Active |
| Owner | QA Automation |
| Updated | 2026-03-26 |
| Scope | AI-powered screenshot review, semantic verdict, Slack delivery, cost controls |
The Visual Fact-Checker is an AI review layer that uses Claude Vision to evaluate screenshots and determine whether a page looks meaningfully broken. It is not a pixel-diff tool. It does not compare against a baseline image. It asks: does this screenshot show something a reader would recognize as broken?
What It Reviews And What It Does Not
| The Fact-Checker Reviews | The Fact-Checker Does Not Do |
|---|---|
| whether a page has its primary content visible | pixel-level comparison against a stored baseline |
| whether navigation and layout elements are present | layout measurement or coordinate validation |
| whether a consent dialog or error page is blocking content | CSS regression detection |
| whether the page looks like the correct site and page type | mobile responsiveness measurement |
| whether critical above-the-fold areas are intact | accessibility or contrast checking |
Commands
| Command | When To Use It |
|---|---|
npm run factcheck | standard run: all suites, desktop viewport |
npm run factcheck:dry | screenshots only, no Claude API call |
npm run factcheck:pdt | PDT suite only (post-deploy focus) |
npm run factcheck:smoke | smoke suite only |
npm run factcheck:failures | only screenshots from failed tests |
npm run factcheck:slack | run and send results to Slack |
npm run factcheck -- --site blesk,auto | specific sites |
npm run factcheck -- --viewport both | desktop and mobile |
npm run factcheck:ci | CI mode: failures only, $2 cap, posts to Slack |
npm run deploy:verify | PDT tests then fact-check failures |
npm run factcheck:dashboard | open local dashboard at localhost:3002 |
How To Read The Output
Verdict Levels
| Verdict | What It Means |
|---|---|
| pass | the screenshot looks correct for the expected page type and content |
| warning | something is unusual — may be a loading issue, a redirect, or a content edge case |
| fail | the screenshot shows a clear problem: missing content, error page, blocked layout, or wrong page |
Confidence
Each verdict includes a confidence level. Low-confidence verdicts (below 70%) are usually caused by:
- login walls or paywalls rendering instead of main content
- redirect interstitials
- partially loaded pages captured too early
Low-confidence verdicts should be reviewed by a human before escalating.
Slack Post Content
When results are sent to Slack, the post includes:
- total screenshots reviewed
- pass / warning / fail counts
- for each fail or warning: what the screenshot showed and why the verdict was assigned
- total cost of the API calls for the run
Dashboard
The local dashboard is available at localhost:3002 when started with npm run factcheck:dashboard. It shows:
- recent run history
- per-screenshot verdict grid
- cost tracking over time
- trend view for verdict distribution
CI Mode
npm run factcheck:ci is the recommended command for use in CI pipelines. It:
- focuses only on failed test screenshots
- caps API cost at $2.00
- posts results to Slack automatically
- stops gracefully when the cap is reached, posting partial results
Use CI mode in post-deploy or nightly pipelines. Do not run the full uncapped factcheck command in CI.
Cost Model
| Mode | Cap | Approximate Cost Per Screenshot |
|---|---|---|
| standard | $5.00 | ~$0.01-0.03 per screenshot |
| CI mode | $2.00 | same |
| dry run | $0 | no API calls |
Actual cost depends on screenshot size and model token usage. The cap is enforced at the run level: once the cap is reached, remaining screenshots are skipped and the partial results are posted.
Cost is tracked in data/factcheck-history.json. Review this file if you need to audit spend over time.
When To Run Each Variant
| Situation | Command |
|---|---|
| test suite just failed and you want to understand what each page looked like | npm run factcheck:failures |
| just deployed and want to verify the site looks correct | npm run deploy:verify |
| setting up CI post-deploy gate | npm run factcheck:ci |
| local investigation of a specific site | npm run factcheck -- --site blesk |
| fast screenshots for offline review (no API spend) | npm run factcheck:dry |
| nightly pipeline with Slack delivery | npm run factcheck:ci |
Do not run the full npm run factcheck (all suites, all sites) as a general regression gate on every commit. The signal is best when scoped to failures or post-deploy moments.
Output Locations
| Output | Path |
|---|---|
| fact-check HTML reports | test-results/fact-check/reports/ |
| captured screenshots | test-results/fact-check/screenshots/ |
| history and cost log | data/factcheck-history.json |
| dashboard source | scripts/visual-fact-checker/dashboard/ |
Related Pages
| Need | Go To |
|---|---|
| AI healing and incident intelligence | AI Processing |
| structural visual regression (baseline diff) | Visual Tests |
| Slack delivery setup | Reporting |
| full command list | CLI Reference |