Commit Graph

26 Commits (50bf32a0ba13935273e200d52b9327821f25efc5)

Author SHA1 Message Date
Tom Proctor 74d4652144
cmd/{containerboot,k8s-operator},k8s-operator: new options to expose user metrics (#14035)
containerboot:

Adds 3 new environment variables for containerboot, `TS_LOCAL_ADDR_PORT` (default
`"${POD_IP}:9002"`), `TS_METRICS_ENABLED` (default `false`), and `TS_DEBUG_ADDR_PORT`
(default `""`), to configure metrics and debug endpoints. In a follow-up PR, the
health check endpoint will be updated to use the `TS_LOCAL_ADDR_PORT` if
`TS_HEALTHCHECK_ADDR_PORT` hasn't been set.

Users previously only had access to internal debug metrics (which are unstable
and not recommended) via passing the `--debug` flag to tailscaled, but can now
set `TS_METRICS_ENABLED=true` to expose the stable metrics documented at
https://tailscale.com/kb/1482/client-metrics at `/metrics` on the addr/port
specified by `TS_LOCAL_ADDR_PORT`.

Users can also now configure a debug endpoint more directly via the
`TS_DEBUG_ADDR_PORT` environment variable. This is not recommended for production
use, but exposes an internal set of debug metrics and pprof endpoints.

operator:

The `ProxyClass` CRD's `.spec.metrics.enable` field now enables serving the
stable user metrics documented at https://tailscale.com/kb/1482/client-metrics
at `/metrics` on the same "metrics" container port that debug metrics were
previously served on. To smooth the transition for anyone relying on the way the
operator previously consumed this field, we also _temporarily_ serve tailscaled's
internal debug metrics on the same `/debug/metrics` path as before, until 1.82.0
when debug metrics will be turned off by default even if `.spec.metrics.enable`
is set. At that point, anyone who wishes to continue using the internal debug
metrics (not recommended) will need to set the new `ProxyClass` field
`.spec.statefulSet.pod.tailscaleContainer.debug.enable`.

Users who wish to opt out of the transitional behaviour, where enabling
`.spec.metrics.enable` also enables debug metrics, can set
`.spec.statefulSet.pod.tailscaleContainer.debug.enable` to false (recommended).

Separately but related, the operator will no longer specify a host port for the
"metrics" container port definition. This caused scheduling conflicts when k8s
needs to schedule more than one proxy per node, and was not necessary for allowing
the pod's port to be exposed to prometheus scrapers.

Updates #11292

---------

Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
6 days ago
Irbe Krumina b9ecc50ce3
cmd/k8s-operator,k8s-operator,kube/kubetypes: add an option to configure app connector via Connector spec (#13950)
* cmd/k8s-operator,k8s-operator,kube/kubetypes: add an option to configure app connector via Connector spec

Updates tailscale/tailscale#11113

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 weeks ago
Irbe Krumina 1103044598
cmd/k8s-operator,k8s-operator: add topology spread constraints to ProxyClass (#13959)
Now when we have HA for egress proxies, it makes sense to support topology
spread constraints that would allow users to define more complex
topologies of how proxy Pods need to be deployed in relation with other
Pods/across regions etc.

Updates tailscale/tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
4 weeks ago
Tom Proctor 36cb2e4e5f
cmd/k8s-operator,k8s-operator: use default ProxyClass if set for ProxyGroup (#13720)
The default ProxyClass can be set via helm chart or env var, and applies
to all proxies that do not otherwise have an explicit ProxyClass set.
This ensures proxies created by the new ProxyGroup CRD are consistent
with the behaviour of existing proxies

Nearby but unrelated changes:

* Fix up double error logs (controller runtime logs returned errors)
* Fix a couple of variable names

Updates #13406

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2 months ago
Irbe Krumina 7f016baa87
cmd/k8s-operator,k8s-operator: create ConfigMap for egress services + small fixes for egress services (#13715)
cmd/k8s-operator, k8s-operator: create ConfigMap for egress services + small reconciler fixes

Updates tailscale/tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 months ago
Tom Proctor e48cddfbb3
cmd/{containerboot,k8s-operator},k8s-operator,kube: add ProxyGroup controller (#13684)
Implements the controller for the new ProxyGroup CRD, designed for
running proxies in a high availability configuration. Each proxy gets
its own config and state Secret, and its own tailscale node ID.

We are currently mounting all of the config secrets into the container,
but will stop mounting them and instead read them directly from the kube
API once #13578 is implemented.

Updates #13406

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2 months ago
Irbe Krumina e8bb5d1be5
cmd/{k8s-operator,containerboot},k8s-operator,kube: reconcile ExternalName Services for ProxyGroup (#13635)
Adds a new reconciler that reconciles ExternalName Services that define a
tailnet target that should be exposed to cluster workloads on a ProxyGroup's
proxies.
The reconciler ensures that for each such service, the config mounted to
the proxies is updated with the tailnet target definition and that
and EndpointSlice and ClusterIP Service are created for the service.

Adds a new reconciler that ensures that as proxy Pods become ready to route
traffic to a tailnet target, the EndpointSlice for the target is updated
with the Pods' endpoints.

Updates tailscale/tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 months ago
Tom Proctor cab2e6ea67
cmd/k8s-operator,k8s-operator: add ProxyGroup CRD (#13591)
The ProxyGroup CRD specifies a set of N pods which will each be a
tailnet device, and will have M different ingress or egress services
mapped onto them. It is the mechanism for specifying how highly
available proxies need to be. This commit only adds the definition, no
controller loop, and so it is not currently functional.

This commit also splits out TailnetDevice and RecorderTailnetDevice
into separate structs because the URL field is specific to recorders,
but we want a more generic struct for use in the ProxyGroup status field.

Updates #13406

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2 months ago
Cameron Stokes 65c26357b1
cmd/k8s-operator, k8s-operator: fix outdated kb links (#13585)
updates #13583

Signed-off-by: Cameron Stokes <cameron@tailscale.com>
2 months ago
Tom Proctor 98f4dd9857
cmd/k8s-operator,k8s-operator,kube: Add TSRecorder CRD + controller (#13299)
cmd/k8s-operator,k8s-operator,kube: Add TSRecorder CRD + controller

Deploys tsrecorder images to the operator's cluster. S3 storage is
configured via environment variables from a k8s Secret. Currently
only supports a single tsrecorder replica, but I've tried to take early
steps towards supporting multiple replicas by e.g. having a separate
secret for auth and state storage.

Example CR:

```yaml
apiVersion: tailscale.com/v1alpha1
kind: Recorder
metadata:
  name: rec
spec:
  enableUI: true
```

Updates #13298

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
3 months ago
Irbe Krumina c5623e0471
go.{mod,sum},tstest/tools,k8s-operator,cmd/k8s-operator: autogenerate CRD API docs (#12884)
Re-instates the functionality that generates CRD API docs, but using
a different library as the one we were using earlier seemed to have
some issues with its Git history.
Also regenerates the docs (make kube-generate-all).

Updates tailscale/tailscale#12859

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
4 months ago
Tom Proctor 3099323976
cmd/k8s-operator,k8s-operator,go.{mod,sum}: publish proxy status condition for annotated services (#12463)
Adds a new TailscaleProxyReady condition type for use in corev1.Service
conditions.

Also switch our CRDs to use metav1.Condition instead of
ConnectorCondition. The Go structs are seralized identically, but it
updates some descriptions and validation rules. Update k8s
controller-tools and controller-runtime deps to fix the documentation
generation for metav1.Condition so that it excludes comments and
TODOs.

Stop expecting the fake client to populate TypeMeta in tests. See
kubernetes-sigs/controller-runtime#2633 for details of the change.

Finally, make some minor improvements to validation for service hostnames.

Fixes #12216

Co-authored-by: Irbe Krumina <irbe@tailscale.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
5 months ago
Irbe Krumina 807934f00c
cmd/k8s-operator,k8s-operator: allow proxies accept advertized routes. (#12388)
Add a new .spec.tailscale.acceptRoutes field to ProxyClass,
that can be optionally set to true for the proxies to
accept routes advertized by other nodes on tailnet (equivalent of
setting --accept-routes to true).

Updates tailscale/tailscale#12322,tailscale/tailscale#10684

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
6 months ago
Irbe Krumina 53d9cac196
k8s-operator/apis/v1alpha1,cmd/k8s-operator/deploy/examples: update DNSConfig description (#11971)
Also removes hardcoded image repo/tag from example DNSConfig resource
as the operator now knows how to default those.

Updates tailscale/tailscale#11019

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
6 months ago
Tom Proctor 23e26e589f
cmd/k8s-operator,k8s-opeerator: include Connector's MagicDNS name and tailnet IPs in status (#12359)
Add new fields TailnetIPs and Hostname to Connector Status. These
contain the addresses of the Tailscale node that the operator created
for the Connector to aid debugging.

Fixes #12214

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
6 months ago
Irbe Krumina 3a6d3f1a5b
cmd/k8s-operator,k8s-operator,go.{mod,sum}: make individual proxy images/image pull policies configurable (#11928)
cmd/k8s-operator,k8s-operator,go.{mod,sum}: make individual proxy images/image pull policies configurable

Allow to configure images and image pull policies for individual proxies
via ProxyClass.Spec.StatefulSet.Pod.{TailscaleContainer,TailscaleInitContainer}.Image,
and ProxyClass.Spec.StatefulSet.Pod.{TailscaleContainer,TailscaleInitContainer}.ImagePullPolicy
fields.
Document that we have images in ghcr.io on the relevant Helm chart fields.

Updates tailscale/tailscale#11675

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
6 months ago
Irbe Krumina cd633a7252
cmd/k8s-operator/deploy,k8s-operator: document that metrics are unstable (#11979)
Updates#11292

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
7 months ago
Irbe Krumina 19b31ac9a6
cmd/{k8s-operator,k8s-nameserver},k8s-operator: update nameserver config with records for ingress/egress proxies (#11019)
cmd/k8s-operator: optionally update dnsrecords Configmap with DNS records for proxies.

This commit adds functionality to automatically populate
DNS records for the in-cluster ts.net nameserver
to allow cluster workloads to resolve MagicDNS names
associated with operator's proxies.

The records are created as follows:
* For tailscale Ingress proxies there will be
a record mapping the MagicDNS name of the Ingress
device and each proxy Pod's IP address.
* For cluster egress proxies, configured via
tailscale.com/tailnet-fqdn annotation, there will be
a record for each proxy Pod, mapping
the MagicDNS name of the exposed
tailnet workload to the proxy Pod's IP.

No records will be created for any other proxy types.
Records will only be created if users have configured
the operator to deploy an in-cluster ts.net nameserver
by applying tailscale.com/v1alpha1.DNSConfig.

It is user's responsibility to add the ts.net nameserver
as a stub nameserver for ts.net DNS names.
https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#configuration-of-stub-domain-and-upstream-nameserver-using-coredns
https://cloud.google.com/kubernetes-engine/docs/how-to/kube-dns#upstream_nameservers

See also https://github.com/tailscale/tailscale/pull/11017

Updates tailscale/tailscale#10499

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
7 months ago
Irbe Krumina 44aa809cb0
cmd/{k8s-nameserver,k8s-operator},k8s-operator: add a kube nameserver, make operator deploy it (#11919)
* cmd/k8s-nameserver,k8s-operator: add a nameserver that can resolve ts.net DNS names in cluster.

Adds a simple nameserver that can respond to A record queries for ts.net DNS names.
It can respond to queries from in-memory records, populated from a ConfigMap
mounted at /config. It dynamically updates its records as the ConfigMap
contents changes.
It will respond with NXDOMAIN to queries for any other record types
(AAAA to be implemented in the future).
It can respond to queries over UDP or TCP. It runs a miekg/dns
DNS server with a single registered handler for ts.net domain names.
Queries for other domain names will be refused.

The intended use of this is:
1) to allow non-tailnet cluster workloads to talk to HTTPS tailnet
services exposed via Tailscale operator egress over HTTPS
2) to allow non-tailnet cluster workloads to talk to workloads in
the same cluster that have been exposed to tailnet over their
MagicDNS names but on their cluster IPs.

DNSConfig CRD can be used to configure
the operator to deploy kube nameserver (./cmd/k8s-nameserver) to cluster.

Updates tailscale/tailscale#10499

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
7 months ago
Irbe Krumina df8f40905b
cmd/k8s-operator,k8s-operator: optionally serve tailscaled metrics on Pod IP (#11699)
Adds a new .spec.metrics field to ProxyClass to allow users to optionally serve
client metrics (tailscaled --debug) on <Pod-IP>:9001.
Metrics cannot currently be enabled for proxies that egress traffic to tailnet
and for Ingress proxies with tailscale.com/experimental-forward-cluster-traffic-via-ingress annotation
(because they currently forward all cluster traffic to their respective backends).

The assumption is that users will want to have these metrics enabled
continuously to be able to monitor proxy behaviour (as opposed to enabling
them temporarily for debugging). Hence we expose them on Pod IP to make it
easier to consume them i.e via Prometheus PodMonitor.

Updates tailscale/tailscale#11292

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
7 months ago
Lee Briggs 14ac41febc
cmd/k8s-operator,k8s-operator: proxyclass affinity (#11862)
add ability to set affinity rules to proxyclass

Updates#11861

Signed-off-by: Lee Briggs <lee@leebriggs.co.uk>
7 months ago
Irbe Krumina 26f9bbc02b
cmd/k8s-operator,k8s-operator: document tailscale.com Custom Resource Definitions better. (#11665)
Updates tailscale/tailscale#10880

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
8 months ago
Irbe Krumina 38fb23f120
cmd/k8s-operator,k8s-operator: allow users to configure proxy env vars via ProxyClass (#11743)
Adds new ProxyClass.spec.statefulSet.pod.{tailscaleContainer,tailscaleInitContainer}.Env field
that allow users to provide key, value pairs that will be set as env vars for the respective containers.
Allow overriding all containerboot env vars,
but warn that this is not supported and might break (in docs + a warning when validating ProxyClass).

Updates tailscale/tailscale#10709

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
8 months ago
Irbe Krumina 231e44e742
Revert "cmd/{k8s-nameserver,k8s-operator},k8s-operator: add a kube nameserver, make operator deploy it (#11017)" (#11669)
Temporarily reverting this PR to avoid releasing
half finished featue.

This reverts commit 9e2f58f846.

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
8 months ago
Irbe Krumina 9e2f58f846
cmd/{k8s-nameserver,k8s-operator},k8s-operator: add a kube nameserver, make operator deploy it (#11017)
* cmd/k8s-nameserver,k8s-operator: add a nameserver that can resolve ts.net DNS names in cluster.

Adds a simple nameserver that can respond to A record queries for ts.net DNS names.
It can respond to queries from in-memory records, populated from a ConfigMap
mounted at /config. It dynamically updates its records as the ConfigMap
contents changes.
It will respond with NXDOMAIN to queries for any other record types
(AAAA to be implemented in the future).
It can respond to queries over UDP or TCP. It runs a miekg/dns
DNS server with a single registered handler for ts.net domain names.
Queries for other domain names will be refused.

The intended use of this is:
1) to allow non-tailnet cluster workloads to talk to HTTPS tailnet
services exposed via Tailscale operator egress over HTTPS
2) to allow non-tailnet cluster workloads to talk to workloads in
the same cluster that have been exposed to tailnet over their
MagicDNS names but on their cluster IPs.

Updates tailscale/tailscale#10499

Signed-off-by: Irbe Krumina <irbe@tailscale.com>

* cmd/k8s-operator/deploy/crds,k8s-operator: add DNSConfig CustomResource Definition

DNSConfig CRD can be used to configure
the operator to deploy kube nameserver (./cmd/k8s-nameserver) to cluster.

Signed-off-by: Irbe Krumina <irbe@tailscale.com>

* cmd/k8s-operator,k8s-operator: optionally reconcile nameserver resources

Adds a new reconciler that reconciles DNSConfig resources.
If a DNSConfig is deployed to cluster,
the reconciler creates kube nameserver resources.
This reconciler is only responsible for creating
nameserver resources and not for populating nameserver's records.

Signed-off-by: Irbe Krumina <irbe@tailscale.com>

* cmd/{k8s-operator,k8s-nameserver}: generate DNSConfig CRD for charts, append to static manifests

Signed-off-by: Irbe Krumina <irbe@tailscale.com>

---------

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
8 months ago
Irbe Krumina 45d27fafd6
cmd/k8s-operator,k8s-operator,go.{mod,sum},tstest/tools: add Tailscale Kubernetes operator API docs (#11246)
Add logic to autogenerate CRD docs.
.github/workflows/kubemanifests.yaml CI workflow will fail if the doc is out of date with regard to the current CRDs.
Docs can be refreshed by running make kube-generate-all.

Updates tailscale/tailscale#11023

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
9 months ago