Architecture Overview

Key	Value
Status	Active
Owner	QA Automation
Updated	2026-03-26
Scope	System design, data flow, and operational building blocks

PW-Tests is built around one idea: tests should produce reusable operational evidence, not just a red or green line in CI. That is why the architecture is wider than Playwright itself. A run starts in Playwright, but it keeps going through logging, classification, Slack reporting, historical matching, dashboards, and long-term documentation.

Core Design Principles

Principle	What It Means In Practice
Tests stay focused	Tests verify behavior. They do not decide policy, reporting, or recovery rules on their own.
Evidence is reusable	The same run data feeds Slack alerts, OpenSearch, Grafana, reports, and healing workflows.
Site-specific behavior lives in configuration	Shared logic stays in framework code; per-site differences stay in config and handlers.
Operational clarity matters	The system tries to distinguish regression, flake, infra noise, and already-known incidents.
Reliability beats cleverness	Fast health checks run often; deeper suites run when the signal is worth the time.

High-Level System View

%%{init: {'theme':'base'}}%%
flowchart LR
    CI["GitLab schedules and manual runs"] --> PW["Playwright suites"]
    PW --> EL["EventLogger and reporter output"]
    EL --> ART["Artifacts and local run files"]
    EL --> OS["OpenSearch"]
    EL --> SL["Slack notifications"]
    OS --> GR["Grafana dashboards"]
    ART --> AI["Healing, incident matching, reports"]
    AI --> SL
    AI --> HIST["Failure history and incident stores"]
    HIST --> SL

Main Subsystems

Subsystem	What It Owns
`tests/`	Playwright suites for smoke, PDT, E2E, mobile, content, visual, and more
`src/core/`	EventLogger, shared test hooks, base helpers, log schema
`src/config/`	Site configuration, selectors, content registry
`src/services/`	Incident matching, failure priors, cause assessment, intelligence
`src/integrations/`	Slack and service integration wrappers
`src/reporting/`	Reporter pipeline and report data preparation
`scripts/ci/`	Slack notifiers, reports, recovery replies, merge and CI utilities
`scripts/monitoring/`	Consent, selector, URL, robots.txt, sitemap, Grafana visual checks
`observability/`	Grafana dashboard-as-code, OpenSearch setup, Prometheus integration
`.confluence/`	Confluence publishing flow and formatting rules

Suite Layout

The system is broader than the original smoke-plus-E2E model. The current platform includes:

Area	Purpose
Smoke	Fast health signal
Shadow	Continuous degradation monitoring
PDT	Post-deploy confidence checks
E2E	Full user journeys
User-Flows	Auth, premium, and session-sensitive flows
Mobile	Responsive, touch, and mobile-performance tiers
Content	Registry-driven content module validation
Visual	Structural screenshot comparisons
Performance	Core Web Vitals and baseline comparison
Ads	Ad placement and rendering checks
Events	Analytics event verification
Monitors	Consent, selectors, URLs, robots.txt, sitemap health, Grafana visual quality

How A Run Moves Through The System

1. Scheduling

Runs start from GitLab schedules, manual web triggers, post-deploy workflows, or local development.

2. Test Execution

Playwright executes the relevant suite or project. Some suites are intentionally sequential for stability. Others are narrow and fast by design.

3. Logging

The EventLogger writes structured records locally and, when configured, to OpenSearch. The reporter adds run summaries, screenshot records, and step records used by Grafana and investigation workflows.

4. Classification And Post-Processing

After execution, post-processing scripts can:

send Slack alerts
match a failure against known incidents
look at historical recurrence
add root-cause confidence
post recovery replies when an issue stops repeating
generate weekly or monthly summaries

5. Human Consumption

The same data is then visible in:

Slack for immediate action
Grafana for trends and investigation
Confluence and markdown docs for long-term system memory

Event And Reporting Architecture

Layer	What It Produces	Who Uses It
EventLogger	detailed event stream	debugging, AI workflows, telemetry
Reporter	test summaries, screenshots, step records	Grafana, Slack, reports
Failure history	recurrence over time	humanized notifications, priors
Incident store	known resolved or open incidents	root-cause tagging and matching
Cause assessor	confidence-weighted verdicts	Slack investigations and future predictive workflows

Why The Logging Model Changed

Older versions of the repo mixed multiple logging styles and left operators stitching together context by hand. The current model is simpler:

one event pipeline instead of several overlapping ones
shared identifiers across reporter and logger records
local artifacts for direct debugging
OpenSearch records for dashboards and long-term analysis

This is what makes things like recurrence detection, incident clustering, and recovery replies possible.

Current Operational Features Worth Knowing

Feature	Why It Matters
Humanized Slack failure alerts	Reduces panic and makes triage faster
Slack recovery replies	Closes the loop when a failure disappears after a fix
Incident store	Prevents the team from rediscovering the same root cause every week
Failure priors and cause assessor	Makes “likely flaky” versus “likely regression” more evidence-based
Grafana dashboard-as-code	Keeps dashboards reviewable and deployable from git
Observability verification scripts	Lets operators check health and data quality without a manual audit
Visual fact-checker	Adds AI review on top of screenshot artifacts for selected flows

Important Data Stores

Path Or Store	Role
`test-results/logs/`	Local structured logs
`test-results/history/`	Run history for recurrence and trend logic
`data/fixes.json`	Fix database
`data/failure-incidents.json`	Known incident registry
`data/failure-history.json`	Confirmed recovery history and prior signals
`data/slack-threads.json`	Slack thread tracking for recovery replies
OpenSearch `cncqa_tests-*`	Human-facing test records for dashboards
OpenSearch `cncqa_events-*`	Machine-facing event records

Directory Map

Path	Why It Exists
`src/`	Shared runtime code
`tests/`	Playwright suites and helpers
`scripts/`	Operators’ command-line tools and CI helpers
`observability/`	Dashboards, mappings, retention helpers
`docs/wiki/`	Main Confluence wiki source
`docs/confluence/`	Standalone Confluence pages outside the main wiki tree
`.confluence/`	Publishing engine and Confluence-specific rules

If you are new to the repo, this order usually works best:

Read Test Types to understand the suite landscape.
Read Logging System to understand what evidence each run leaves behind.
Read Integrations to understand Slack, Grafana, and OpenSearch.
Read AI Processing only after the basics make sense.

Practical Takeaway

If you remember one thing, make it this: PW-Tests is not just a collection of Playwright specs. It is a small operational platform. Tests create the evidence, but the value comes from how that evidence is enriched, routed, explained, and reused.