Failure Categories
| Key | Value |
|---|
| Status | Active |
| Owner | QA Automation |
| Updated | 2026-03-26 |
| Scope | Symptom categories, root-cause labels, and how to interpret failures without overreacting |
One of the easiest ways to make test operations worse is to mix up symptom and cause. A timeout is not a root cause. A selector miss is not always a test bug. This page separates the categories the system uses so people can interpret failures more accurately.
Two Different Category Layers
| Layer | Purpose |
|---|
| symptom category | what the failure looked like technically |
| root-cause label | what we believe actually caused it |
Example:
- symptom category:
TIMEOUT_ELEMENT
- root-cause label:
PRODUCT.SITE_REDESIGN
That distinction matters because the action is different.
Common Symptom Categories
| Symptom Category | What It Usually Means |
|---|
SELECTOR_NOT_IN_DOM | expected element is gone or renamed |
TIMEOUT_ELEMENT | element or action did not complete in time |
ELEMENT_INTERCEPTED | something blocked an interaction |
NETWORK_FAILED | navigation or resource request failed |
HTTP_5XX | backend or edge returned server error |
HTTP_4XX | page or endpoint not found or not allowed |
PAGE_CRASHED | browser page crashed |
CONTENT_MISMATCH | expected content shape does not match live output |
| consent-related categories | overlay or consent handling blocked progress |
Common Root-Cause Themes
The incident system uses more human-meaningful causes than raw failure text.
| Root-Cause Domain | Typical Meaning |
|---|
PRODUCT.* | the site changed in a real way |
TEST.* | the test or selector assumption is wrong |
ENVIRONMENT.* | CI, timing, or network instability |
CONTENT_DATA.* | content is missing, rotated, or structurally different |
THIRD_PARTY.* | outside dependency drift |
RELEASE_CHANGE.* | deploy or release-specific side effect |
How Operators Should Read A Failure
| If You See... | Ask Next... |
|---|
| isolated timeout | is this a flaky one-off or a known weak spot? |
| repeated selector miss on one site | did the site redesign or did our selector age out? |
| same failure across unrelated tests | is this really infra or shared environment noise? |
| post-fix repeat | is the fix incomplete, or is this a new symptom of an old issue? |
Confidence Labels In Slack
These are not raw categories. They are operator-facing verdicts built from categories plus context.
| Label | What It Tells The Reader |
|---|
Confirmed regression | strong evidence of a real product or test regression |
Needs confirmation | not enough signal yet |
Likely flaky | behavior looks intermittent rather than systemic |
Infra/CI suspicion | shared environment or runner problem is more likely |
Post-fix, watching | issue matches something recently fixed and is being monitored |
Likely same infra event | secondary failure probably belongs to the same environment issue |
Why Historical Context Matters
A single failure run can be misleading. History changes interpretation:
- one timeout after a fix may be watch-only
- ten repeats over seven days is a pattern
- a failure with a matching resolved incident is very different from a brand-new unknown
This is why failure history, incident store data, and recovery tracking now matter so much.
| Pattern | Good First Response |
|---|
| one failing test, no history, selectors look healthy | rerun and watch |
several unrelated tests on one site all hit NETWORK_FAILED in one run | treat as infra suspicion first |
| same PDT test repeats after a visible redesign | likely product or test update needed |
| visual suite says snapshot missing | configuration issue, not UI regression |
What Not To Do
- do not equate timeout with product bug automatically
- do not treat every selector miss as test negligence
- do not post “regression detected” when the suite is actually blocked by missing baselines or infra
- do not bury the real verdict under raw classifier vocabulary
Related Pages