cmd/{containerboot,k8s-operator},kube: add preshutdown hook for egress PG proxies
This change is part of work towards minimizing downtime during update
rollouts of egress ProxyGroup replicas.
This change:
- updates the containerboot health check logic to return Pod IP in headers,
if set
- always runs the health check for egress PG proxies
- updates ClusterIP Services created for PG egress endpoints to include
the health check endpoint
- implements preshutdown endpoint in proxies. The preshutdown endpoint
logic waits till, for all currently configured egress services, the ClusterIP
Service health check endpoint is no longer returned by the shutting-down Pod
(by looking at the new Pod IP header).
- ensures that kubelet is configured to call the preshutdown endpoint
This reduces the possibility that, as replicas are terminated during an update,
a replica gets terminated to which cluster traffic is still being routed via
the ClusterIP Service because kube proxy has not yet updated routig rules.
This is not a perfect check as in practice, it only checks that the kube
proxy on the node on which the proxy runs has updated rules. However, overall
this might be good enough.
The preshutdown logic is disabled if users have configured a custom health check
port via TS_LOCAL_ADDR_PORT env var. This change throws a warnign if so and in
future setting of that env var for operator proxies might be disallowed (as users
shouldn't need to configure this for a Pod directly).
This is backwards compatible with earlier proxy versions.
Updates tailscale/tailscale#14326
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
http.Error(w,fmt.Sprintf("error determining the number of times health check endpoint should be pinged: %v",err),http.StatusInternalServerError)
return
}
ep.waitTillSafeToShutdown(r.Context(),cfgs,hp)
}
// waitTillSafeToShutdown looks up all egress targets configured to be proxied via this instance and, for each target
// whose configuration includes a healthcheck endpoint, pings the endpoint till none of the responses
// are returned by this instance or till the HTTP request times out. In practice, the endpoint will be a Kubernetes Service for whom one of the backends
// would normally be this Pod. When this Pod is being deleted, the operator should have removed it from the Service
// backends and eventually kube proxy routing rules should be updated to no longer route traffic for the Service to this
ifcfgs==nil||len(*cfgs)==0{// avoid sleeping if no services are configured
return
}
log.Printf("Ensuring that cluster traffic for egress targets is no longer routed via this Pod...")
wg:=syncs.WaitGroup{}
fors,cfg:=range*cfgs{
hep:=cfg.HealthCheckEndpoint
ifhep==""{
log.Printf("Tailnet target %q does not have a cluster healthcheck specified, unable to verify if cluster traffic for the target is still routed via this Pod",s)
continue
}
svc:=s
wg.Go(func(){
log.Printf("Ensuring that cluster traffic is no longer routed to %q via this Pod...",svc)
tsoperator.SetProxyGroupCondition(pg,tsapi.ProxyGroupReady,metav1.ConditionFalse,reasonProxyGroupCreating,"the ProxyGroup's ProxyClass default-pc is not yet in a ready state, waiting...",0,cl,zl.Sugar())