Overview

Czech News Center QA · Built 2026-03-31

PW-Tests

Playwright tests for CNC websites. 10 suites, 7 sites, everything wired into OpenSearch and Grafana.

96%

Pass Rate

5000

Total Tests

Sites

363

Commits (30d)

Fixes Applied

Test Suites

Suite Health at a Glance

Click any row to navigate to the suite detail view.

Suite	Pass Rate	Tests	Duration
Ads	100%	549/549	6.5s
Content	100%	45/45	9.6s
E2e	96.55%	700/725	9.8s
Events	100%	9/9	11.4s
Mobile	100%	270/270	3.1s
Pdt	96.53%	1950/2020	35.2s
Shadow	98.35%	417/424	1m 14s
Smoke	100%	108/108	6.4s
Unknown	95.83%	506/528	11.0s
User Flows	79.5%	256/322	9.3s

Site Health

Overall health per site, aggregated across all suites.

All 98.35%

Auto.cz 93.69%

Blesk.cz 96.07%

E15.cz 97.49%

Isport.cz 97.89%

Opinio.cz 93.42%

Reflex.cz 98.3%

Test Suites

10 suites, each checking something different. Click a card to dig into details.

All

Passing

Partial

10 suites

Sites

7 Czech News Center websites under test.

Site Comparison

Site	URL	Consent	Health
All	all	Unknown	98.35%
Auto.cz	www.auto.cz	CPEX	93.69%
Blesk.cz	www.blesk.cz	CPEX	96.07%
E15.cz	www.e15.cz	Didomi	97.49%
Isport.cz	isport.blesk.cz	CPEX	97.89%
Opinio.cz	opinio.cz	CPEX	93.42%
Reflex.cz	www.reflex.cz	Didomi	98.3%

CI Runners

7 projects on 2 GitLab runners

Monthly Development Report

Features, tests added, and infrastructure improvements by month.

Monthly Failure Report

Failure timeline, root causes, fix stories, and unresolved issues.

Architecture

System components, data flow, and directory structure.

Data Flow

Click a node. Tests feed three parallel paths that all end at Slack.

Project Structure

Core Components

Methodology

Selector priorities, failure categories, and the auto-healing loop. Click through each one.

Selector Strategy

Pick the most stable selector you can. Click each level to see why.

Self-Healing Workflow

Test breaks? The system tries to fix it before anyone has to look.

Failure Classification

Every failure gets a category. Some we fix automatically, others need a human.

Auto-Fixable

Requires Investigation

▶ Test Writing Template

import { test, expect } from '@playwright/test';
import { CncSite } from '../src/core';

test.describe('Feature @smoke @blesk', () => {
  let site: CncSite;

  test.beforeEach(async ({ page }) => {
    site = new CncSite(test, page, 'blesk');
    await site.load('/', 'Homepage');
    await site.consent(true);
  });

  test('should load homepage', async () => {
    await expect(site.page).toHaveTitle(/Blesk/);
    await site.assertElementVisible('[data-testid="header"]');
  });
});

Observability Stack

OpenSearch stores it, Grafana shows it, Prometheus measures it, Slack yells about it. Click any node for details.

Stack Overview

OpenSearch Indices

Index Pattern	Purpose	Retention	Updated By
`cncqa_tests-*`	Test results for Grafana dashboards	90 days	Reporter
`cncqa_events-*`	Detailed events for AI/machine analysis	30 days	EventLogger
`*-YYYY-MM-img`	Failure screenshots (base64)	30 days (ISM)	Reporter
`*-YYYY-MM-cr`	Step records (pw:api traces)	30 days (ISM)	Reporter

Development Report & Timeline

What the team delivered, told in words. Release changelog below.

March 2026 March 1 – 31, 2026

Failure Intelligence Pipeline

We moved from simple substring matching to a multi-layered classification engine. Failures are now matched against a structured incident store using weighted fingerprinting across seven dimensions. A historical confidence layer tracks how often each test has failed for each root cause, blending past patterns with current evidence to produce verdicts that get smarter over time.

Incident store with six root cause domains and seventeen hierarchical tags
Weighted incident matcher scoring fingerprint, site, error category, selector overlap, date window, URL pattern, and message content
Historical prior service computing confidence bands from confirmed recovery events
Cause assessor blending both layers into a final verdict with human-readable explanation

Slack Alert Transformation

Nightly failure notifications were completely rewritten to communicate in human terms instead of dumping raw failure counts. Each failure now gets a verdict label — confirmed regression, post-fix watching, likely flaky, infra suspicion, or needs confirmation. Failures are clustered by site, and an investigation thread is posted automatically with per-failure breakdowns and next-step recommendations.

Recovery detection posts confirmation to original failure thread when a test passes consecutively
Thread state machine manages open, resolved, superseded, and stale failure threads
Weekly report redesigned with executive summary, trend comparison, and root cause breakdown
Escalation contact suggestion appended to alerts based on failure classification

Operations & Escalation System

A brand-new escalation system answers the question that classification alone could not: the test failed, QA confirmed it is real — now who do I contact? Three normalized JSON databases map contacts, sites, and routing rules across twelve escalation categories. A resolver module implements strict precedence matching and the portal page presents it all as an interactive workflow and lookup tool.

Seventy-eight CNC sites and twenty-seven contacts seeded from the ownership spreadsheet
Twelve escalation categories from content issues to video player failures
Five-step visual workflow on the portal: Test Fails → QA Triages → Classify → Escalate → Recovery
Build-time validation with eight error checks and five warning checks

Project Portal

The documentation portal launched with thirty-one pages, CNC brand design, content registry sidebar, and dark mode support. A second version is in progress with a modular build pipeline — separate collectors for git, OpenSearch, CI config, and test results feed into interactive templates that can show live data alongside static documentation.

New pages: escalation matrix, content registry, per-suite operational runbooks
Portal v2 architecture: collectors, derived data, template engine, cache layer

Grafana 2.0 Dashboard Overhaul

All three Grafana dashboards were redesigned. Status got a sites-by-suites matrix with per-cell drill-down links. Investigate replaced its table with a Dynamic Text panel showing formatted errors and stack traces. Trends got proper multi-select variables and fixed data links. The reporter was enhanced with ANSI stripping, normalization, and over five hundred step wrappers across twenty-seven test files.

Resolved the UUID hyphen parsing bug that caused "No data" across Investigate queries
Debunked the .keyword field mismatch — all indices have proper sub-fields
Production ISM policies and index templates deployed for retention management

Observability & Infrastructure

Several infrastructure issues were found and fixed. OpenSearch indices that had been accumulating indefinitely now rotate monthly with automatic thirty-day cleanup. A twelve-check verification script validates observability health. A Playwright-based Grafana monitor catches dashboard rendering failures that API queries cannot detect.

Fixed disk-full incident caused by static index names without retention
Unified telemetry: three loggers replaced by single EventLogger, removing twelve hundred lines of dead code
Visual regression tests rewritten from fifty-four failing tests to eighteen passing in under thirty seconds

Documentation Expansion

Wiki grew from fourteen to sixteen pages. Confluence standalone pages expanded from four to fifteen with per-suite operational runbooks. Eight mermaid diagrams were added or restored. A Slack message formatting guide documents verdict labels, clustered thread style, and wording rules.