Skip to main content

Monitoring

Zen Mesh provides built-in Prometheus metrics and Grafana dashboards for observability.

Built-in Metrics

Data Plane Metrics

MetricTypeDescription
zen_ingester_events_totalCounterTotal events received
zen_ingester_delivery_duration_secondsHistogramEnd-to-end delivery latency
zen_ingester_deliveries_failed_totalCounterFailed delivery attempts
zen_ingester_dlq_sizeGaugeCurrent dead letter queue depth

Edge Plane Metrics

MetricTypeDescription
zen_agent_enrollment_statusGaugeAgent enrollment state (0=unenrolled, 1=enrolled)
zen_egress_connections_activeGaugeActive mTLS connections
zen_egress_events_delivered_totalCounterEvents delivered to local targets

Infrastructure Metrics

MetricTypeDescription
zen_lock_decryptions_totalCounterSecret decryption operations
zen_lock_decryption_duration_secondsHistogramDecryption latency

Grafana Dashboards

Zen Mesh includes pre-built Grafana dashboards:

  • Overview: Cluster health, delivery rates, error rates
  • Per-Cluster: Individual cluster enrollment, agent status, egress throughput
  • Delivery: Per-destination success rates, latency percentiles, retry counts

Import the dashboards from the helm-charts repository.

Alerts

Built-in Prometheus alert rules cover:

  • Agent disconnection (cluster offline)
  • Delivery failure rate exceeding threshold
  • DLQ depth exceeding limit
  • Certificate rotation failures
  • Backpressure activation

Health Endpoints

Each component exposes a /healthz endpoint:

kubectl get --raw /api/v1/namespaces/zen-mesh/services/zen-agent/healthz

Responses include component version, uptime, and last successful heartbeat.