Visual Fact-Checker

KeyValue
StatusActive
OwnerQA Automation
Updated2026-03-26
ScopeAI-powered screenshot review, semantic verdict, Slack delivery, cost controls

The Visual Fact-Checker is an AI review layer that uses Claude Vision to evaluate screenshots and determine whether a page looks meaningfully broken. It is not a pixel-diff tool. It does not compare against a baseline image. It asks: does this screenshot show something a reader would recognize as broken?

What It Reviews And What It Does Not

The Fact-Checker ReviewsThe Fact-Checker Does Not Do
whether a page has its primary content visiblepixel-level comparison against a stored baseline
whether navigation and layout elements are presentlayout measurement or coordinate validation
whether a consent dialog or error page is blocking contentCSS regression detection
whether the page looks like the correct site and page typemobile responsiveness measurement
whether critical above-the-fold areas are intactaccessibility or contrast checking

Commands

CommandWhen To Use It
npm run factcheckstandard run: all suites, desktop viewport
npm run factcheck:dryscreenshots only, no Claude API call
npm run factcheck:pdtPDT suite only (post-deploy focus)
npm run factcheck:smokesmoke suite only
npm run factcheck:failuresonly screenshots from failed tests
npm run factcheck:slackrun and send results to Slack
npm run factcheck -- --site blesk,autospecific sites
npm run factcheck -- --viewport bothdesktop and mobile
npm run factcheck:ciCI mode: failures only, $2 cap, posts to Slack
npm run deploy:verifyPDT tests then fact-check failures
npm run factcheck:dashboardopen local dashboard at localhost:3002

How To Read The Output

Verdict Levels

VerdictWhat It Means
passthe screenshot looks correct for the expected page type and content
warningsomething is unusual — may be a loading issue, a redirect, or a content edge case
failthe screenshot shows a clear problem: missing content, error page, blocked layout, or wrong page

Confidence

Each verdict includes a confidence level. Low-confidence verdicts (below 70%) are usually caused by:

  • login walls or paywalls rendering instead of main content
  • redirect interstitials
  • partially loaded pages captured too early

Low-confidence verdicts should be reviewed by a human before escalating.

Slack Post Content

When results are sent to Slack, the post includes:

  • total screenshots reviewed
  • pass / warning / fail counts
  • for each fail or warning: what the screenshot showed and why the verdict was assigned
  • total cost of the API calls for the run

Dashboard

The local dashboard is available at localhost:3002 when started with npm run factcheck:dashboard. It shows:

  • recent run history
  • per-screenshot verdict grid
  • cost tracking over time
  • trend view for verdict distribution

CI Mode

npm run factcheck:ci is the recommended command for use in CI pipelines. It:

  • focuses only on failed test screenshots
  • caps API cost at $2.00
  • posts results to Slack automatically
  • stops gracefully when the cap is reached, posting partial results

Use CI mode in post-deploy or nightly pipelines. Do not run the full uncapped factcheck command in CI.

Cost Model

ModeCapApproximate Cost Per Screenshot
standard$5.00~$0.01-0.03 per screenshot
CI mode$2.00same
dry run$0no API calls

Actual cost depends on screenshot size and model token usage. The cap is enforced at the run level: once the cap is reached, remaining screenshots are skipped and the partial results are posted.

Cost is tracked in data/factcheck-history.json. Review this file if you need to audit spend over time.

When To Run Each Variant

SituationCommand
test suite just failed and you want to understand what each page looked likenpm run factcheck:failures
just deployed and want to verify the site looks correctnpm run deploy:verify
setting up CI post-deploy gatenpm run factcheck:ci
local investigation of a specific sitenpm run factcheck -- --site blesk
fast screenshots for offline review (no API spend)npm run factcheck:dry
nightly pipeline with Slack deliverynpm run factcheck:ci

Do not run the full npm run factcheck (all suites, all sites) as a general regression gate on every commit. The signal is best when scoped to failures or post-deploy moments.

Output Locations

OutputPath
fact-check HTML reportstest-results/fact-check/reports/
captured screenshotstest-results/fact-check/screenshots/
history and cost logdata/factcheck-history.json
dashboard sourcescripts/visual-fact-checker/dashboard/
NeedGo To
AI healing and incident intelligenceAI Processing
structural visual regression (baseline diff)Visual Tests
Slack delivery setupReporting
full command listCLI Reference