Troubleshooting
| Key | Value |
|---|
| Status | Active |
| Owner | QA Automation |
| Updated | 2026-03-26 |
| Scope | Practical triage paths for common local, CI, suite, and integration problems |
This page is written for moments when something is already broken and you want the shortest path to a useful next step.
Start Here
| Symptom | Best First Check |
|---|
| one test failed locally | artifacts and local logs |
| scheduled run failed in Slack | investigation thread plus history context |
| many tests failed at once | look for a shared environment cause before fixing tests |
| visual suite exploded | check baseline availability before reading diffs |
| dashboards look empty | run observability or Grafana validation |
| Slack did not post | verify token, webhook, and channel config |
Local Setup Problems
| Problem | Most Likely Cause | Good Next Step |
|---|
| Playwright will not start | browser install missing | install browsers again |
| commands fail immediately after clone | dependencies not installed | run npm install |
| local run behaves oddly across sites | wrong or stale .env assumptions | check SITE and local overrides |
CI Versus Local Mismatch
| Pattern | Likely Cause |
|---|
| passes locally, fails in CI | timing, network, artifact, or baseline difference |
| only fails on one runner | environment instability |
| visual suite only fails in CI | Linux baseline mismatch or missing snapshots |
Suite-Specific Quick Advice
Smoke, Shadow, PDT
| If You See | Think About |
|---|
| one selector timeout on one site | recent redesign or selector drift |
| cross-domain oddness on Blesk family sites | host guard logic and site assumptions |
| sudden broad failure cluster | shared environment or site-wide issue |
E2E And User-Flows
| If You See | Think About |
|---|
| gallery or video failures | content rotation, fallback logic, lazy loading |
| login or premium failures | auth state, consent overlays, modal interference |
| conditional live-surface failures | ephemeral content, not always regression |
Mobile
| If You See | Think About |
|---|
| mobile-only flake | responsive overlays, touch target timing, seed URL choice |
| Safari-specific drift | browser-specific rendering or interaction differences |
| deep-tier failures | performance budget and journey timing before selectors |
Content
| If You See | Think About |
|---|
| many missing entries in one context | seed URL or context grouping issue |
| one handler failing repeatedly | handler logic before registry data |
| regressions after CMS change | registry and selector freshness |
Visual
| If You See | Think About |
|---|
A snapshot doesn't exist | missing baseline, not visual regression |
| many failures across all sites at once | baseline lifecycle or compare-mode problem |
| one true diff on one site | structural change or intended redesign |
Integration Problems
Slack
| Symptom | Usual Cause |
|---|
| no message posted | missing webhook or bot token |
| wrong channel | env var mismatch |
| thread replies missing | bot-token path unavailable or thread tracking issue |
Grafana And OpenSearch
| Symptom | Usual Cause |
|---|
| dashboard panels empty | datasource or query drift |
| query validation errors | field-name or index mismatch |
| misleading aggregates | legacy and current data mixed in shared indices |
| OpenSearch write issues | wrong endpoint or missing permissions |
A Good Investigation Order
- confirm whether the failure is isolated or clustered
- check artifacts and logs
- check whether history or incident memory already explains it
- decide if the problem is product, test, infra, or configuration
- only then decide whether to fix code, rerun, or just watch
| Need | Command |
|---|
| recent failures | npm run os:failed |
| fix history | npm run fix:recent |
| health verification | npm run observability:verify:health |
| dashboard query validation | npm run grafana:validate-data |
| local debug run | npm run test:debug |
| UI investigation | npm run test:ui |
When To Escalate
Escalate sooner when:
- several suites fail at once
- a deploy-critical PDT failure repeats
- dashboards or observability are blind
- robots.txt, sitemap, or URL monitors show broad site damage
- the same incident repeats after a supposed fix
Related Pages