Observability and Dashboards
| Key | Value |
|---|---|
| Status | Active |
| Owner | QA Automation |
| Updated | 2026-03-26 |
| Scope | OpenSearch, Grafana, Prometheus, Alertmanager — setup, access, dashboards, and common issues |
The observability stack turns raw test results into queryable records and visual dashboards. This page covers how to operate the stack, access Grafana, deploy dashboards, and resolve common issues.
Stack Components And Ports
| Component | Port | Role |
|---|---|---|
| OpenSearch | 9200 | structured log and test record storage |
| OpenSearch Dashboards | 5601 | ad-hoc log search and query UI |
| Grafana | 3000 (local) | primary dashboards for status, investigation, and trends |
| Prometheus | 9090 | time-series metrics |
| Alertmanager | 9093 | alert routing to Slack |
In production, Grafana is at https://grafana.measure.aws.cnci.tech. OpenSearch is accessed via the Grafana proxy, not directly.
Starting And Stopping The Stack
| Command | What It Does |
|---|---|
npm run observability:start | start all stack components via Docker Compose |
npm run observability:stop | stop all components |
npm run observability:status | check which components are running |
npm run observability:logs | tail combined stack logs |
The stack requires Docker. All configuration lives under observability/.
Grafana Access
Production
URL: https://grafana.measure.aws.cnci.tech
Authentication: service account token (GRAFANA_SERVICE_ACCOUNT_TOKEN). This token is stored in /Users/petr/CNC/pw-tests-beta/.env and used in CI variables.
Local
URL: http://localhost:3000
Start with npm run observability:start. Default login is admin/admin unless changed in the local provisioning config.
OpenSearch Access
OpenSearch is not accessed directly in the primary write path. It is accessed via the Grafana proxy using GRAFANA_SERVICE_ACCOUNT_TOKEN.
| User | Proxy URL | Use Case |
|---|---|---|
cnc_writer | https://grafana.measure.aws.cnci.tech/api/datasources/proxy/uid/opensearch-pw-tests | reporter write path, search queries |
v1admin | https://grafana.measure.aws.cnci.tech/api/datasources/proxy/38 | ISM policies, index templates, admin operations |
Set OPENSEARCH_URL to the appropriate proxy URL for the operation you are running.
Key Dashboards
| Dashboard | What It Shows | When To Use It |
|---|---|---|
| Status | current pass rates by site and suite, recent run outcomes | morning check, after a deploy |
| Investigate | failure breakdown by error category, per-site drill-down, selector failures | when a run has failures and you want context |
| Trends | 14-day pass rate history, flaky test candidates, recurrence patterns | weekly review, before writing a report |
OpenSearch Indices
| Index Pattern | Contents |
|---|---|
cncqa_tests-* | test summaries, pass/fail records per test, screenshot references |
cncqa_events-* | detailed per-action event records from EventLogger |
Grafana dashboards query cncqa_tests- by default. cncqa_events- is used for deep AI-assisted debugging workflows.
Deploying And Updating Dashboards
Dashboards are not edited in the Grafana UI. They are defined in TypeScript and generated to JSON.
| Path | Purpose |
|---|---|
observability/grafana/suite/ | TypeScript dashboard definitions |
observability/grafana/generated/ | generated JSON (output of generate step) |
| Command | What It Does |
|---|---|
npm run grafana:deploy | generate TypeScript dashboards to JSON and deploy to Grafana |
npm run grafana:validate-data | run query validation — checks that panels return data |
npm run monitor:grafana | browser-based visual check of all dashboard panels |
After making any change to TypeScript dashboard files, run npm run grafana:deploy to push the update. Never edit the generated JSON directly.
Retention Setup
Index retention is managed via OpenSearch ISM (Index State Management) policies. These define rollover and deletion rules.
| Command | What It Does |
|---|---|
npm run os:setup-retention | create ISM policies and index templates |
npm run os:stats | show current index sizes and shard counts |
Retention setup requires v1admin access. Set both OPENSEARCH_URL (to the v1admin proxy) and GRAFANA_TOKEN before running.
Example:
GRAFANA_TOKEN=$GRAFANA_SERVICE_ACCOUNT_TOKEN OPENSEARCH_URL=.../proxy/38 npm run os:setup-retentionQuerying Failures
| Command | What It Does |
|---|---|
npm run os:failed | recent failures from OpenSearch |
npm run os:failed blesk | failures for a specific site |
These queries use the cnc_writer proxy path and require GRAFANA_SERVICE_ACCOUNT_TOKEN and OPENSEARCH_URL.
Common Issues
| Issue | What To Check |
|---|---|
| OpenSearch not starting locally | check if port 9200 is already in use; run npm run observability:status |
| Grafana showing no data | verify OPENSEARCH_URL is set to the correct proxy URL; confirm the index pattern matches cncqa_tests-* |
| aggregation queries failing on some shards | old indices (pre-2025) may lack .keyword sub-fields; this is expected for historical data |
| V1 and beta records mixing in dashboards | V1 records use blesk.cz as the site name; beta records use blesk; queries may need to handle both forms |
| events index empty in production | cncqa_events-* requires OPENSEARCH_URL to be configured in the CI environment where tests run |
| dashboard deploy fails | check that GRAFANA_SERVICE_ACCOUNT_TOKEN and GRAFANA_URL are set |
Key Environment Variables
| Variable | Purpose |
|---|---|
OPENSEARCH_URL | write endpoint; set to Grafana proxy in CI |
GRAFANA_URL | Grafana base URL |
GRAFANA_SERVICE_ACCOUNT_TOKEN | authenticates all proxy operations |
PROMETHEUS_PUSHGATEWAY_URL | metrics push endpoint for CI runners |
Related Pages
| Need | Go To |
|---|---|
| EventLogger and log schema | Logging System |
| Slack and GitLab wiring | Integrations |
| full observability wiki page | Observability |
| Grafana query patterns | .claude/docs/grafana-patterns.md in the repo |