Skip to main content

Gateway API Migration Readiness

Status: Planning — inventory and modeling only. No runtime migration performed. Last Updated: 2026-05-24 Related Evidence: gateway_api_migration_readiness.json


Purpose

This page documents the readiness state for migrating Zen Mesh from Kubernetes Ingress (networking.k8s.io/v1) to Gateway API. It is planning and inventory only — no runtime traffic cutover has been performed.

Why Gateway API

Gateway API provides:

  • Role-based routing — cluster operators, app developers, and security teams each get their own RBAC scope
  • Multi-tenant gateway sharing — better resource isolation than shared Ingress
  • Protocol-native routing — GRPCRoute, TCPRoute, TLSRoute beyond HTTP
  • Portable cross-provider config — less vendor lock-in than nginx-specific annotations

Current State

All Zen Mesh routing currently uses Kubernetes Ingress with the nginx ingress class.

Hostname Ownership

HostPlaneOwnerIngress Template
internal control-plane host (not publicly reachable)Control PlaneHermes (UI)ingress-frontend.yaml
api.zen-mesh.ioControl Planenanobotingress-api.yaml
ingest.zen-mesh.ioData Planenanobotingress-webhook.yaml
platform.zen-mesh.ioControl Planenanobotcustomer-api-ingress.yaml, mcp-ingress.yaml
m2m.zenmesh.ioControl Planenanobotingress-m2m.yaml

Route Inventory

19 routes inventoried across control plane and data plane:

  • 14 SaaS control-plane routes — frontend, BFF, back API, health, metrics, Stripe, billing, WebSocket
  • 5 data-plane / platform routes — webhook ingestion, gRPC health, customer API, MCP, M2M

TLS Ownership

All routes use cert-manager for TLS certificate management:

  • Public routes: letsencrypt-prod ClusterIssuer
  • Internal routes: zenmesh-tls static secret or cert-manager
  • mTLS paths: cert-manager CA issuer for SVID/client certs

Target State

Migration target is Gateway API v1.0+ with:

  • 5 Gateway resources (gateway-api, gateway-app, gateway-dp, gateway-platform, gateway-m2m)
  • 19 HTTPRoute/GRPCRoute resources matching current Ingress paths
  • Full TLS parity with cert-manager
  • Rate limiting, timeouts, and backend protocol preserved

Planned Gateways

GatewayHostsRoutesStatus
gateway-apiapi.zen-mesh.io12 control-plane API routesplanned
gateway-appinternal control-plane host (not publicly reachable)5 frontend/BFF routesplanned
gateway-dpingest.zen-mesh.io, api.zen-mesh.io2 data-plane routesplanned
gateway-platformplatform.zen-mesh.io2 platform API routesplanned
gateway-m2mm2m.zenmesh.io1 M2M routeplanned

What Is Validated

  • ✅ All 19 Ingress routes inventoried with hosts, paths, TLS, security requirements
  • ✅ Candidate Gateway API resources modeled (19 HTTPRoute/GRPCRoute)
  • ✅ Plane/layer classification complete (CONTROL_PLANE, DATA_PLANE; L1-L3)
  • ✅ TLS ownership classified per route
  • ✅ Public/internal and customer-visible classification complete
  • ✅ Rollback requirements defined for all customer-facing routes
  • ✅ Security requirements captured (mTLS, HMAC, TLS, rate limiting)
  • ✅ Controller evaluation: NGINX Gateway Fabric, GKE Gateway Controller, Envoy Gateway
  • ✅ CRD readiness: Gateway API v1 CRDs not yet installed in clusters
  • ✅ All 9 gateway validators PASS

Controller Selection

Status: Recommended (not installed, not deployed)

After evaluating 9 controller candidates across all planes, the following per-plane recommendations are active:

PlaneRecommendedFallbackRejected
SaaS / Control Plane (GKE)GKE Gateway ControllerEnvoy GatewayIstio, Contour, Cilium, NGINX Gateway Fabric
Data PlaneEnvoy GatewayGKE Gateway ControllerNGINX Gateway Fabric (no GRPCRoute), Istio, Contour, Cilium
Edge PlaneEnvoy GatewayNGINX Gateway FabricGKE (cloud-only), Istio, Cilium, Traefik, Kong
Private Data PlaneDeferredEnvoy Gateway (tentative)All deferred

Why Envoy Gateway for Data Plane and Edge

  • GRPCRoute — native support for gRPC routes (required for data-plane ingester health)
  • Cloud-agnostic — no GKE/AWS dependency for edge-plane
  • CNCF graduated GA — production-grade maturity
  • Full Gateway API v1.0+ — Gateway, HTTPRoute, GRPCRoute, TLSRoute, TCPRoute, BackendTLSPolicy

Why GKE Gateway Controller for SaaS

  • Managed — lowest operational overhead on GKE
  • GA — production-ready, no controller deployment needed
  • Lock-in acceptable — SaaS already GKE-dependent

What's Next for Controllers

  1. Install Envoy Gateway in k3d sandbox for local validation
  2. Enable GKE Gateway API on sandbox GKE cluster
  3. Validate GRPCRoute with zen-cluster ingester
  4. Test TLS/cert-manager integration
  5. Verify rate limiting parity with nginx annotations

See Controller Selection Decision for full rationale.

What Is NOT Yet Migrated

  • ❌ No Gateway API resources created in any cluster
  • ❌ No runtime traffic cutover
  • ❌ No DNS changes
  • ❌ No cert rotation performed
  • ❌ No Ingress removal
  • ❌ No GRPCRoute support verified (requires Gateway API v1.2+)
  • ❌ No BackendTLSPolicy for HTTPS backend-protocol routes

Blockers

BlockerRoutes AffectedResolution
GRPCRoute requires Gateway API v1.2+grpc-ingester-healthzInstall v1.2 CRDs; verify Envoy Gateway support
BackendTLSPolicy needed for HTTPS backendm2m-apiModel BackendTLSPolicy; verify Envoy Gateway support
No controller installed in sandboxAllInstall Envoy Gateway in k3d sandbox for local validation

Non-Claims

  • No runtime migration performed
  • No production-live Gateway API claim
  • No DNS cutover
  • No cert rotation performed
  • No traffic proof
  • No app UI freshness claim
  • No Ingress removal
  • No Gateway API runtime cutover