Observability and Dashboards

KeyValue
StatusActive
OwnerQA Automation
Updated2026-03-26
ScopeOpenSearch, Grafana, Prometheus, Alertmanager — setup, access, dashboards, and common issues

The observability stack turns raw test results into queryable records and visual dashboards. This page covers how to operate the stack, access Grafana, deploy dashboards, and resolve common issues.

Stack Components And Ports

ComponentPortRole
OpenSearch9200structured log and test record storage
OpenSearch Dashboards5601ad-hoc log search and query UI
Grafana3000 (local)primary dashboards for status, investigation, and trends
Prometheus9090time-series metrics
Alertmanager9093alert routing to Slack

In production, Grafana is at https://grafana.measure.aws.cnci.tech. OpenSearch is accessed via the Grafana proxy, not directly.

Starting And Stopping The Stack

CommandWhat It Does
npm run observability:startstart all stack components via Docker Compose
npm run observability:stopstop all components
npm run observability:statuscheck which components are running
npm run observability:logstail combined stack logs

The stack requires Docker. All configuration lives under observability/.

Grafana Access

Production

URL: https://grafana.measure.aws.cnci.tech

Authentication: service account token (GRAFANA_SERVICE_ACCOUNT_TOKEN). This token is stored in /Users/petr/CNC/pw-tests-beta/.env and used in CI variables.

Local

URL: http://localhost:3000

Start with npm run observability:start. Default login is admin/admin unless changed in the local provisioning config.

OpenSearch Access

OpenSearch is not accessed directly in the primary write path. It is accessed via the Grafana proxy using GRAFANA_SERVICE_ACCOUNT_TOKEN.

UserProxy URLUse Case
cnc_writerhttps://grafana.measure.aws.cnci.tech/api/datasources/proxy/uid/opensearch-pw-testsreporter write path, search queries
v1adminhttps://grafana.measure.aws.cnci.tech/api/datasources/proxy/38ISM policies, index templates, admin operations

Set OPENSEARCH_URL to the appropriate proxy URL for the operation you are running.

Key Dashboards

DashboardWhat It ShowsWhen To Use It
Statuscurrent pass rates by site and suite, recent run outcomesmorning check, after a deploy
Investigatefailure breakdown by error category, per-site drill-down, selector failureswhen a run has failures and you want context
Trends14-day pass rate history, flaky test candidates, recurrence patternsweekly review, before writing a report

OpenSearch Indices

Index PatternContents
cncqa_tests-*test summaries, pass/fail records per test, screenshot references
cncqa_events-*detailed per-action event records from EventLogger

Grafana dashboards query cncqa_tests- by default. cncqa_events- is used for deep AI-assisted debugging workflows.

Deploying And Updating Dashboards

Dashboards are not edited in the Grafana UI. They are defined in TypeScript and generated to JSON.

PathPurpose
observability/grafana/suite/TypeScript dashboard definitions
observability/grafana/generated/generated JSON (output of generate step)
CommandWhat It Does
npm run grafana:deploygenerate TypeScript dashboards to JSON and deploy to Grafana
npm run grafana:validate-datarun query validation — checks that panels return data
npm run monitor:grafanabrowser-based visual check of all dashboard panels

After making any change to TypeScript dashboard files, run npm run grafana:deploy to push the update. Never edit the generated JSON directly.

Retention Setup

Index retention is managed via OpenSearch ISM (Index State Management) policies. These define rollover and deletion rules.

CommandWhat It Does
npm run os:setup-retentioncreate ISM policies and index templates
npm run os:statsshow current index sizes and shard counts

Retention setup requires v1admin access. Set both OPENSEARCH_URL (to the v1admin proxy) and GRAFANA_TOKEN before running.

Example:

code
GRAFANA_TOKEN=$GRAFANA_SERVICE_ACCOUNT_TOKEN OPENSEARCH_URL=.../proxy/38 npm run os:setup-retention

Querying Failures

CommandWhat It Does
npm run os:failedrecent failures from OpenSearch
npm run os:failed bleskfailures for a specific site

These queries use the cnc_writer proxy path and require GRAFANA_SERVICE_ACCOUNT_TOKEN and OPENSEARCH_URL.

Common Issues

IssueWhat To Check
OpenSearch not starting locallycheck if port 9200 is already in use; run npm run observability:status
Grafana showing no dataverify OPENSEARCH_URL is set to the correct proxy URL; confirm the index pattern matches cncqa_tests-*
aggregation queries failing on some shardsold indices (pre-2025) may lack .keyword sub-fields; this is expected for historical data
V1 and beta records mixing in dashboardsV1 records use blesk.cz as the site name; beta records use blesk; queries may need to handle both forms
events index empty in productioncncqa_events-* requires OPENSEARCH_URL to be configured in the CI environment where tests run
dashboard deploy failscheck that GRAFANA_SERVICE_ACCOUNT_TOKEN and GRAFANA_URL are set

Key Environment Variables

VariablePurpose
OPENSEARCH_URLwrite endpoint; set to Grafana proxy in CI
GRAFANA_URLGrafana base URL
GRAFANA_SERVICE_ACCOUNT_TOKENauthenticates all proxy operations
PROMETHEUS_PUSHGATEWAY_URLmetrics push endpoint for CI runners
NeedGo To
EventLogger and log schemaLogging System
Slack and GitLab wiringIntegrations
full observability wiki pageObservability
Grafana query patterns.claude/docs/grafana-patterns.md in the repo