PW-Tests
Playwright tests for CNC websites. 10 suites, 7 sites, everything wired into OpenSearch and Grafana.
| Suite | Pass Rate | Tests | Bar | Duration | Trend |
|---|---|---|---|---|---|
| Ads | 100% | 549/549 | 6.5s | ||
| Content | 100% | 45/45 | 9.6s | ||
| E2e | 96.55% | 700/725 | 9.8s | ||
| Events | 100% | 9/9 | 11.4s | ||
| Mobile | 100% | 270/270 | 3.1s | ||
| Pdt | 96.53% | 1950/2020 | 35.2s | ||
| Shadow | 98.35% | 417/424 | 1m 14s | ||
| Smoke | 100% | 108/108 | 6.4s | ||
| Unknown | 95.83% | 506/528 | 11.0s | ||
| User Flows | 79.5% | 256/322 | 9.3s |
Test Suites
10 suites, each checking something different. Click a card to dig into details.
Sites
7 Czech News Center websites under test.
| Site | URL | Consent | Health |
|---|---|---|---|
| All | all | Unknown | 98.35% |
| Auto.cz | www.auto.cz | CPEX | 93.69% |
| Blesk.cz | www.blesk.cz | CPEX | 96.07% |
| E15.cz | www.e15.cz | Didomi | 97.49% |
| Isport.cz | isport.blesk.cz | CPEX | 97.89% |
| Opinio.cz | opinio.cz | CPEX | 93.42% |
| Reflex.cz | www.reflex.cz | Didomi | 98.3% |
CI Runners
7 projects on 2 GitLab runners
Monthly Development Report
Features, tests added, and infrastructure improvements by month.
Monthly Failure Report
Failure timeline, root causes, fix stories, and unresolved issues.
Architecture
System components, data flow, and directory structure.
Methodology
Selector priorities, failure categories, and the auto-healing loop. Click through each one.
Pick the most stable selector you can. Click each level to see why.
Test breaks? The system tries to fix it before anyone has to look.
Every failure gets a category. Some we fix automatically, others need a human.
Observability Stack
OpenSearch stores it, Grafana shows it, Prometheus measures it, Slack yells about it. Click any node for details.
| Index Pattern | Purpose | Retention | Updated By |
|---|---|---|---|
cncqa_tests-* | Test results for Grafana dashboards | 90 days | Reporter |
cncqa_events-* | Detailed events for AI/machine analysis | 30 days | EventLogger |
*-YYYY-MM-img | Failure screenshots (base64) | 30 days (ISM) | Reporter |
*-YYYY-MM-cr | Step records (pw:api traces) | 30 days (ISM) | Reporter |
Development Report & Timeline
What the team delivered, told in words. Release changelog below.
We moved from simple substring matching to a multi-layered classification engine. Failures are now matched against a structured incident store using weighted fingerprinting across seven dimensions. A historical confidence layer tracks how often each test has failed for each root cause, blending past patterns with current evidence to produce verdicts that get smarter over time.
- Incident store with six root cause domains and seventeen hierarchical tags
- Weighted incident matcher scoring fingerprint, site, error category, selector overlap, date window, URL pattern, and message content
- Historical prior service computing confidence bands from confirmed recovery events
- Cause assessor blending both layers into a final verdict with human-readable explanation
Nightly failure notifications were completely rewritten to communicate in human terms instead of dumping raw failure counts. Each failure now gets a verdict label — confirmed regression, post-fix watching, likely flaky, infra suspicion, or needs confirmation. Failures are clustered by site, and an investigation thread is posted automatically with per-failure breakdowns and next-step recommendations.
- Recovery detection posts confirmation to original failure thread when a test passes consecutively
- Thread state machine manages open, resolved, superseded, and stale failure threads
- Weekly report redesigned with executive summary, trend comparison, and root cause breakdown
- Escalation contact suggestion appended to alerts based on failure classification
A brand-new escalation system answers the question that classification alone could not: the test failed, QA confirmed it is real — now who do I contact? Three normalized JSON databases map contacts, sites, and routing rules across twelve escalation categories. A resolver module implements strict precedence matching and the portal page presents it all as an interactive workflow and lookup tool.
- Seventy-eight CNC sites and twenty-seven contacts seeded from the ownership spreadsheet
- Twelve escalation categories from content issues to video player failures
- Five-step visual workflow on the portal: Test Fails → QA Triages → Classify → Escalate → Recovery
- Build-time validation with eight error checks and five warning checks
The documentation portal launched with thirty-one pages, CNC brand design, content registry sidebar, and dark mode support. A second version is in progress with a modular build pipeline — separate collectors for git, OpenSearch, CI config, and test results feed into interactive templates that can show live data alongside static documentation.
- New pages: escalation matrix, content registry, per-suite operational runbooks
- Portal v2 architecture: collectors, derived data, template engine, cache layer
All three Grafana dashboards were redesigned. Status got a sites-by-suites matrix with per-cell drill-down links. Investigate replaced its table with a Dynamic Text panel showing formatted errors and stack traces. Trends got proper multi-select variables and fixed data links. The reporter was enhanced with ANSI stripping, normalization, and over five hundred step wrappers across twenty-seven test files.
- Resolved the UUID hyphen parsing bug that caused "No data" across Investigate queries
- Debunked the .keyword field mismatch — all indices have proper sub-fields
- Production ISM policies and index templates deployed for retention management
Several infrastructure issues were found and fixed. OpenSearch indices that had been accumulating indefinitely now rotate monthly with automatic thirty-day cleanup. A twelve-check verification script validates observability health. A Playwright-based Grafana monitor catches dashboard rendering failures that API queries cannot detect.
- Fixed disk-full incident caused by static index names without retention
- Unified telemetry: three loggers replaced by single EventLogger, removing twelve hundred lines of dead code
- Visual regression tests rewritten from fifty-four failing tests to eighteen passing in under thirty seconds
Wiki grew from fourteen to sixteen pages. Confluence standalone pages expanded from four to fifteen with per-suite operational runbooks. Eight mermaid diagrams were added or restored. A Slack message formatting guide documents verdict labels, clustered thread style, and wording rules.
Release Changelog
Per-version details. Expand any version for the full list.
Documentation
Browse project documentation pages.