Visual Fact-Checker

Key	Value
Status	Active
Owner	QA Automation
Updated	2026-03-26
Scope	AI-powered screenshot review, semantic verdict, Slack delivery, cost controls

The Visual Fact-Checker is an AI review layer that uses Claude Vision to evaluate screenshots and determine whether a page looks meaningfully broken. It is not a pixel-diff tool. It does not compare against a baseline image. It asks: does this screenshot show something a reader would recognize as broken?

What It Reviews And What It Does Not

The Fact-Checker Reviews	The Fact-Checker Does Not Do
whether a page has its primary content visible	pixel-level comparison against a stored baseline
whether navigation and layout elements are present	layout measurement or coordinate validation
whether a consent dialog or error page is blocking content	CSS regression detection
whether the page looks like the correct site and page type	mobile responsiveness measurement
whether critical above-the-fold areas are intact	accessibility or contrast checking

Commands

Command	When To Use It
`npm run factcheck`	standard run: all suites, desktop viewport
`npm run factcheck:dry`	screenshots only, no Claude API call
`npm run factcheck:pdt`	PDT suite only (post-deploy focus)
`npm run factcheck:smoke`	smoke suite only
`npm run factcheck:failures`	only screenshots from failed tests
`npm run factcheck:slack`	run and send results to Slack
`npm run factcheck -- --site blesk,auto`	specific sites
`npm run factcheck -- --viewport both`	desktop and mobile
`npm run factcheck:ci`	CI mode: failures only, $2 cap, posts to Slack
`npm run deploy:verify`	PDT tests then fact-check failures
`npm run factcheck:dashboard`	open local dashboard at localhost:3002

How To Read The Output

Verdict Levels

Verdict	What It Means
pass	the screenshot looks correct for the expected page type and content
warning	something is unusual — may be a loading issue, a redirect, or a content edge case
fail	the screenshot shows a clear problem: missing content, error page, blocked layout, or wrong page

Confidence

Each verdict includes a confidence level. Low-confidence verdicts (below 70%) are usually caused by:

login walls or paywalls rendering instead of main content
redirect interstitials
partially loaded pages captured too early

Low-confidence verdicts should be reviewed by a human before escalating.

Slack Post Content

When results are sent to Slack, the post includes:

total screenshots reviewed
pass / warning / fail counts
for each fail or warning: what the screenshot showed and why the verdict was assigned
total cost of the API calls for the run

Dashboard

The local dashboard is available at localhost:3002 when started with npm run factcheck:dashboard. It shows:

recent run history
per-screenshot verdict grid
cost tracking over time
trend view for verdict distribution

CI Mode

npm run factcheck:ci is the recommended command for use in CI pipelines. It:

focuses only on failed test screenshots
caps API cost at $2.00
posts results to Slack automatically
stops gracefully when the cap is reached, posting partial results

Use CI mode in post-deploy or nightly pipelines. Do not run the full uncapped factcheck command in CI.

Cost Model

Mode	Cap	Approximate Cost Per Screenshot
standard	$5.00	~$0.01-0.03 per screenshot
CI mode	$2.00	same
dry run	$0	no API calls

Actual cost depends on screenshot size and model token usage. The cap is enforced at the run level: once the cap is reached, remaining screenshots are skipped and the partial results are posted.

Cost is tracked in data/factcheck-history.json. Review this file if you need to audit spend over time.

When To Run Each Variant

Situation	Command
test suite just failed and you want to understand what each page looked like	`npm run factcheck:failures`
just deployed and want to verify the site looks correct	`npm run deploy:verify`
setting up CI post-deploy gate	`npm run factcheck:ci`
local investigation of a specific site	`npm run factcheck -- --site blesk`
fast screenshots for offline review (no API spend)	`npm run factcheck:dry`
nightly pipeline with Slack delivery	`npm run factcheck:ci`

Do not run the full npm run factcheck (all suites, all sites) as a general regression gate on every commit. The signal is best when scoped to failures or post-deploy moments.

Output Locations

Output	Path
fact-check HTML reports	`test-results/fact-check/reports/`
captured screenshots	`test-results/fact-check/screenshots/`
history and cost log	`data/factcheck-history.json`
dashboard source	`scripts/visual-fact-checker/dashboard/`

Need	Go To
AI healing and incident intelligence	AI Processing
structural visual regression (baseline diff)	Visual Tests
Slack delivery setup	Reporting
full command list	CLI Reference