Architecture

Architecture Overview

KeyValue
StatusActive
OwnerQA Automation
Updated2026-03-26
ScopeSystem design, data flow, and operational building blocks

PW-Tests is built around one idea: tests should produce reusable operational evidence, not just a red or green line in CI. That is why the architecture is wider than Playwright itself. A run starts in Playwright, but it keeps going through logging, classification, Slack reporting, historical matching, dashboards, and long-term documentation.

Core Design Principles

PrincipleWhat It Means In Practice
Tests stay focusedTests verify behavior. They do not decide policy, reporting, or recovery rules on their own.
Evidence is reusableThe same run data feeds Slack alerts, OpenSearch, Grafana, reports, and healing workflows.
Site-specific behavior lives in configurationShared logic stays in framework code; per-site differences stay in config and handlers.
Operational clarity mattersThe system tries to distinguish regression, flake, infra noise, and already-known incidents.
Reliability beats clevernessFast health checks run often; deeper suites run when the signal is worth the time.

High-Level System View

%%{init: {'theme':'base'}}%%
flowchart LR
    CI["GitLab schedules and manual runs"] --> PW["Playwright suites"]
    PW --> EL["EventLogger and reporter output"]
    EL --> ART["Artifacts and local run files"]
    EL --> OS["OpenSearch"]
    EL --> SL["Slack notifications"]
    OS --> GR["Grafana dashboards"]
    ART --> AI["Healing, incident matching, reports"]
    AI --> SL
    AI --> HIST["Failure history and incident stores"]
    HIST --> SL

Main Subsystems

SubsystemWhat It Owns
tests/Playwright suites for smoke, PDT, E2E, mobile, content, visual, and more
src/core/EventLogger, shared test hooks, base helpers, log schema
src/config/Site configuration, selectors, content registry
src/services/Incident matching, failure priors, cause assessment, intelligence
src/integrations/Slack and service integration wrappers
src/reporting/Reporter pipeline and report data preparation
scripts/ci/Slack notifiers, reports, recovery replies, merge and CI utilities
scripts/monitoring/Consent, selector, URL, robots.txt, sitemap, Grafana visual checks
observability/Grafana dashboard-as-code, OpenSearch setup, Prometheus integration
.confluence/Confluence publishing flow and formatting rules

Suite Layout

The system is broader than the original smoke-plus-E2E model. The current platform includes:

AreaPurpose
SmokeFast health signal
ShadowContinuous degradation monitoring
PDTPost-deploy confidence checks
E2EFull user journeys
User-FlowsAuth, premium, and session-sensitive flows
MobileResponsive, touch, and mobile-performance tiers
ContentRegistry-driven content module validation
VisualStructural screenshot comparisons
PerformanceCore Web Vitals and baseline comparison
AdsAd placement and rendering checks
EventsAnalytics event verification
MonitorsConsent, selectors, URLs, robots.txt, sitemap health, Grafana visual quality

How A Run Moves Through The System

1. Scheduling

Runs start from GitLab schedules, manual web triggers, post-deploy workflows, or local development.

2. Test Execution

Playwright executes the relevant suite or project. Some suites are intentionally sequential for stability. Others are narrow and fast by design.

3. Logging

The EventLogger writes structured records locally and, when configured, to OpenSearch. The reporter adds run summaries, screenshot records, and step records used by Grafana and investigation workflows.

4. Classification And Post-Processing

After execution, post-processing scripts can:

  • send Slack alerts
  • match a failure against known incidents
  • look at historical recurrence
  • add root-cause confidence
  • post recovery replies when an issue stops repeating
  • generate weekly or monthly summaries

5. Human Consumption

The same data is then visible in:

  • Slack for immediate action
  • Grafana for trends and investigation
  • Confluence and markdown docs for long-term system memory

Event And Reporting Architecture

LayerWhat It ProducesWho Uses It
EventLoggerdetailed event streamdebugging, AI workflows, telemetry
Reportertest summaries, screenshots, step recordsGrafana, Slack, reports
Failure historyrecurrence over timehumanized notifications, priors
Incident storeknown resolved or open incidentsroot-cause tagging and matching
Cause assessorconfidence-weighted verdictsSlack investigations and future predictive workflows

Why The Logging Model Changed

Older versions of the repo mixed multiple logging styles and left operators stitching together context by hand. The current model is simpler:

  • one event pipeline instead of several overlapping ones
  • shared identifiers across reporter and logger records
  • local artifacts for direct debugging
  • OpenSearch records for dashboards and long-term analysis

This is what makes things like recurrence detection, incident clustering, and recovery replies possible.

Current Operational Features Worth Knowing

FeatureWhy It Matters
Humanized Slack failure alertsReduces panic and makes triage faster
Slack recovery repliesCloses the loop when a failure disappears after a fix
Incident storePrevents the team from rediscovering the same root cause every week
Failure priors and cause assessorMakes “likely flaky” versus “likely regression” more evidence-based
Grafana dashboard-as-codeKeeps dashboards reviewable and deployable from git
Observability verification scriptsLets operators check health and data quality without a manual audit
Visual fact-checkerAdds AI review on top of screenshot artifacts for selected flows

Important Data Stores

Path Or StoreRole
test-results/logs/Local structured logs
test-results/history/Run history for recurrence and trend logic
data/fixes.jsonFix database
data/failure-incidents.jsonKnown incident registry
data/failure-history.jsonConfirmed recovery history and prior signals
data/slack-threads.jsonSlack thread tracking for recovery replies
OpenSearch cncqa_tests-*Human-facing test records for dashboards
OpenSearch cncqa_events-*Machine-facing event records

Directory Map

PathWhy It Exists
src/Shared runtime code
tests/Playwright suites and helpers
scripts/Operators’ command-line tools and CI helpers
observability/Dashboards, mappings, retention helpers
docs/wiki/Main Confluence wiki source
docs/confluence/Standalone Confluence pages outside the main wiki tree
.confluence/Publishing engine and Confluence-specific rules

If you are new to the repo, this order usually works best:

  1. Read Test Types to understand the suite landscape.
  2. Read Logging System to understand what evidence each run leaves behind.
  3. Read Integrations to understand Slack, Grafana, and OpenSearch.
  4. Read AI Processing only after the basics make sense.

Practical Takeaway

If you remember one thing, make it this: PW-Tests is not just a collection of Playwright specs. It is a small operational platform. Tests create the evidence, but the value comes from how that evidence is enriched, routed, explained, and reused.