Commit Graph

8408 Commits (efdfd547979fc09ea30d96bf31fcc06cadc538f3)
 

Author SHA1 Message Date
Tom Proctor efdfd54797
cmd/k8s-operator: avoid port collision with metrics endpoint (#14185)
When the operator enables metrics on a proxy, it uses the port 9001,
and in the near future it will start using 9002 for the debug endpoint
as well. Make sure we don't choose ports from a range that includes
9001 so that we never clash. Setting TS_SOCKS5_SERVER, TS_HEALTHCHECK_ADDR_PORT,
TS_OUTBOUND_HTTP_PROXY_LISTEN, and PORT could also open arbitrary ports,
so we will need to document that users should not choose ports from the
10000-11000 range for those settings.

Updates #13406

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
1 day ago
Irbe Krumina 9f9063e624
cmd/k8s-operator,k8s-operator,go.mod: optionally create ServiceMonitor (#14248)
* cmd/k8s-operator,k8s-operator,go.mod: optionally create ServiceMonitor

Adds a new spec.metrics.serviceMonitor field to ProxyClass.
If that's set to true (and metrics are enabled), the operator
will create a Prometheus ServiceMonitor for each proxy to which
the ProxyClass applies.
Additionally, create a metrics Service for each proxy that has
metrics enabled.

Updates tailscale/tailscale#11292

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 days ago
Irbe Krumina eabb424275
cmd/k8s-operator,docs/k8s: run tun mode proxies in privileged containers (#14262)
We were previously relying on unintended behaviour by runc where
all containers where by default given read/write/mknod permissions
for tun devices.
This behaviour was removed in https://github.com/opencontainers/runc/pull/3468
and released in runc 1.2.
Containerd container runtime, used by Docker and majority of Kubernetes distributions
bumped runc to 1.2 in 1.7.24 https://github.com/containerd/containerd/releases/tag/v1.7.24
thus breaking our reference tun mode Tailscale Kubernetes manifests and Kubernetes
operator proxies.

This PR changes the all Kubernetes container configs that run Tailscale in tun mode
to privileged. This should not be a breaking change because all these containers would
run in a Pod that already has a privileged init container.

Updates tailscale/tailscale#14256
Updates tailscale/tailscale#10814

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 days ago
KevinLiang10 3f54572539 IPN: Update ServeConfig to accept configuration for Services.
This commit updates ServeConfig to allow configuration to Services (VIPServices for now) via Serve.
The scope of this commit is only adding the Services field to ServeConfig. The field doesn't actually
allow packet flowing yet. The purpose of this commit is to unblock other work on k8s end.

Updates #22953

Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
2 days ago
Brad Fitzpatrick 8d0c690f89 net/netcheck: clean up ICMP probe AddrPort lookup
Fixes #14200

Change-Id: Ib086814cf63dda5de021403fe1db4fb2a798eaae
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 days ago
Tom Proctor 24095e4897
cmd/containerboot: serve health on local endpoint (#14246)
* cmd/containerboot: serve health on local endpoint

We introduced stable (user) metrics in #14035, and `TS_LOCAL_ADDR_PORT`
with it. Rather than requiring users to specify a new addr/port
combination for each new local endpoint they want the container to
serve, this combines the health check endpoint onto the local addr/port
used by metrics if `TS_ENABLE_HEALTH_CHECK` is used instead of
`TS_HEALTHCHECK_ADDR_PORT`.

`TS_LOCAL_ADDR_PORT` now defaults to binding to all interfaces on 9002
so that it works more seamlessly and with less configuration in
environments other than Kubernetes, where the operator always overrides
the default anyway. In particular, listening on localhost would not be
accessible from outside the container, and many scripted container
environments do not know the IP address of the container before it's
started. Listening on all interfaces allows users to just set one env
var (`TS_ENABLE_METRICS` or `TS_ENABLE_HEALTH_CHECK`) to get a fully
functioning local endpoint they can query from outside the container.

Updates #14035, #12898

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
3 days ago
Brad Fitzpatrick a68efe2088 cmd/checkmetrics: add command for checking metrics against kb
This commit adds a command to validate that all the metrics that
are registring in the client are also present in a path or url.

It is intended to be ran from the KB against the latest version of
tailscale.

Updates tailscale/corp#24066
Updates tailscale/corp#22075

Co-Authored-By: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
3 days ago
Irbe Krumina 13faa64c14
cmd/k8s-operator: always set stateful filtering to false (#14216)
Updates tailscale/tailscale#12108

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
5 days ago
Irbe Krumina 44c8892c18
Makefile,./build_docker.sh: update kube operator image build target name (#14251)
Updates tailscale/corp#24540
Updates tailscale/tailscale#12914

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
5 days ago
Irbe Krumina f8587e321e
cmd/k8s-operator: fix port name change bug for egress ProxyGroup proxies (#14247)
Ensure that the ExternalName Service port names are always synced to the
ClusterIP Service, to fix a bug where if users created a Service with
a single unnamed port and later changed to 1+ named ports, the operator
attempted to apply an invalid multi-port Service with an unnamed port.
Also, fixes a small internal issue where not-yet Service status conditons
were lost on a spec update.

Updates tailscale/tailscale#10102

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
6 days ago
Kristoffer Dalby 61dd2662ec tsnet: remove flaky test marker from metrics
Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
7 days ago
Kristoffer Dalby caba123008 wgengine/magicsock: packet/bytes metrics should not count disco
Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
7 days ago
Kristoffer Dalby 225d8f5a88 tsnet: validate sent data in metrics test
Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
7 days ago
Kristoffer Dalby e55899386b tsnet: split bytes and routes metrics tests
Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
7 days ago
Kristoffer Dalby 06d929f9ac tsnet: send less data in metrics integration test
this commit reduced the amount of data sent in the metrics
data integration test from 10MB to 1MB.

On various machines 10MB was quite flaky, while 1MB has not failed
once on 10000 runs.

Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
7 days ago
Kristoffer Dalby 41e56cedf8 health: move health metrics test to health_test
Updates #13420

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
7 days ago
Joe Tsai bac3af06f5
logtail: avoid bytes.Buffer allocation (#11858)
Re-use a pre-allocated bytes.Buffer struct and
shallow the copy the result of bytes.NewBuffer into it
to avoid allocating the struct.

Note that we're only reusing the bytes.Buffer struct itself
and not the underling []byte temporarily stored within it.

Updates #cleanup
Updates tailscale/corp#18514
Updates golang/go#67004

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
1 week ago
Anton Tolchanov bb80f14ff4 ipn/localapi: count localapi requests to metric endpoints
Updates tailscale/corp#22075

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
1 week ago
Andrew Dunham e87b71ec3c control/controlhttp: set *health.Tracker in tests
Observed during another PR:
    https://github.com/tailscale/tailscale/actions/runs/12040045880/job/33569141807

Updates #cleanup

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I9e0f49a35485fa2e097892737e5e3c95bf775a90
1 week ago
Nick Khyl a62f7183e4 cmd/tailscale/cli: fix format string
Updates #12687

Signed-off-by: Nick Khyl <nickk@tailscale.com>
1 week ago
Mario Minardi 26de518413
ipn/ipnlocal: only check CanUseExitNode if we are attempting to use one (#14230)
In https://github.com/tailscale/tailscale/pull/13726 we added logic to
`checkExitNodePrefsLocked` to error out on platforms where using an
exit node is unsupported in order to give users more obvious feedback
than having this silently fail downstream.

The above change neglected to properly check whether the device in
question was actually trying to use an exit node when doing the check
and was incorrectly returning an error on any calls to
`checkExitNodePrefsLocked` on platforms where using an exit node is not
supported as a result.

This change remedies this by adding a check to see whether the device is
attempting to use an exit node before doing the `CanUseExitNode` check.

Updates https://github.com/tailscale/corp/issues/24835

Signed-off-by: Mario Minardi <mario@tailscale.com>
1 week ago
James Tucker 4d33f30f91 net/netmon: improve panic reporting from #14202
I was hoping we'd catch an example input quickly, but the reporter had
rebooted their machine and it is no longer exhibiting the behavior. As
such this code may be sticking around quite a bit longer and we might
encounter other errors, so include the panic in the log entry.

Updates #14201
Updates #14202
Updates golang/go#70528

Signed-off-by: James Tucker <james@tailscale.com>
1 week ago
Nick Khyl 788121f475 docs/windows/policy: update ADMX policy definitions to reflect the syspolicy settings
We add a policy definition for the AllowedSuggestedExitNodes syspolicy setting, allowing admins
to configure a list of exit node IDs to be used as a pool for automatic suggested exit node selection.

We update definitions for policy settings configurable on both a per-user and per-machine basis,
such as UI customizations, to specify class="Both".

Lastly, we update the help text for existing policy definitions to include a link to the KB article
as the last line instead of in the first paragraph.

Updates #12687
Updates tailscale/corp#19681

Signed-off-by: Nick Khyl <nickk@tailscale.com>
1 week ago
Irbe Krumina ba3523fc3f
cmd/containerboot: preserve headers of metrics endpoints responses (#14204)
Updates tailscale/tailscale#11292

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 weeks ago
James Tucker f6431185b0 net/netmon: catch ParseRIB panic to gather buffer data
Updates #14201
Updates golang/go#70528

Signed-off-by: James Tucker <james@tailscale.com>
2 weeks ago
Nick Khyl 36b7449fea ipn/ipnlocal: rebuild allowed suggested exit nodes when syspolicy changes
In this PR, we update LocalBackend to rebuild the set of allowed suggested exit nodes whenever
the AllowedSuggestedExitNodes syspolicy setting changes. Additionally, we request a new suggested
exit node when this occurs, enabling its use if the ExitNodeID syspolicy setting is set to auto:any.

Updates #12687

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Nick Khyl 3353f154bb control/controlclient: use the most recent syspolicy.MachineCertificateSubject value
This PR removes the sync.Once wrapper around retrieving the MachineCertificateSubject policy
setting value, ensuring the most recent version is always used if it changes after the service starts.

Although this policy setting is used by a very limited number of customers, recent support escalations have highlighted issues caused by outdated or incorrect policy values being applied.

Updates #12687

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Nick Khyl eb3cd32911 ipn/ipnlocal: update ipn.Prefs when there's a change in syspolicy settings
In this PR, we update ipnlocal.NewLocalBackend to subscribe to policy change notifications
and reapply syspolicy settings to the current profile's ipn.Prefs whenever a change occurs.

Updates #12687

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Nick Khyl 2ab66d9698 ipn/ipnlocal: move syspolicy handling from setExitNodeID to applySysPolicy
This moves code that handles ExitNodeID/ExitNodeIP syspolicy settings
from (*LocalBackend).setExitNodeID to applySysPolicy.

Updates #12687

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Nick Khyl 7c8f663d70 cmd/tailscaled: log SCM interactions if the policy setting is enabled at the time of interaction
This updates the syspolicy.LogSCMInteractions check to run at the time of an interaction,
just before logging a message, instead of during service startup. This ensures the most
recent policy setting is used if it has changed since the service started.

Updates #12687

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Nick Khyl 50bf32a0ba cmd/tailscaled: flush DNS if FlushDNSOnSessionUnlock is true upon receiving a session change notification
In this PR, we move the syspolicy.FlushDNSOnSessionUnlock check from service startup
to when a session change notification is received. This ensures that the most recent policy
setting value is used if it has changed since the service started.

We also plan to handle session change notifications for unrelated reasons
and need to decouple notification subscriptions from DNS anyway.

Updates #12687
Updates tailscale/corp#18342

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Nick Khyl 8e5cfbe4ab util/syspolicy/rsop: reduce policyReloadMinDelay and policyReloadMaxDelay when in tests
These delays determine how soon syspolicy change callbacks are invoked after a policy setting is updated
in a policy source. For tests, we shorten these delays to minimize unnecessary wait times. This adjustment
only affects tests that subscribe to policy change notifications and modify policy settings after they have
already been set. Initial policy settings are always available immediately without delay.

Updates #12687

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Nick Khyl 462e1fc503 ipn/{ipnlocal,localapi}, wgengine/netstack: call (*LocalBackend).Shutdown when tests that create them complete
We have several places where LocalBackend instances are created for testing, but they are rarely shut down
when the tests that created them exit.

In this PR, we update newTestLocalBackend and similar functions to use testing.TB.Cleanup(lb.Shutdown)
to ensure LocalBackend instances are properly shut down during test cleanup.

Updates #12687

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Tom Proctor 74d4652144
cmd/{containerboot,k8s-operator},k8s-operator: new options to expose user metrics (#14035)
containerboot:

Adds 3 new environment variables for containerboot, `TS_LOCAL_ADDR_PORT` (default
`"${POD_IP}:9002"`), `TS_METRICS_ENABLED` (default `false`), and `TS_DEBUG_ADDR_PORT`
(default `""`), to configure metrics and debug endpoints. In a follow-up PR, the
health check endpoint will be updated to use the `TS_LOCAL_ADDR_PORT` if
`TS_HEALTHCHECK_ADDR_PORT` hasn't been set.

Users previously only had access to internal debug metrics (which are unstable
and not recommended) via passing the `--debug` flag to tailscaled, but can now
set `TS_METRICS_ENABLED=true` to expose the stable metrics documented at
https://tailscale.com/kb/1482/client-metrics at `/metrics` on the addr/port
specified by `TS_LOCAL_ADDR_PORT`.

Users can also now configure a debug endpoint more directly via the
`TS_DEBUG_ADDR_PORT` environment variable. This is not recommended for production
use, but exposes an internal set of debug metrics and pprof endpoints.

operator:

The `ProxyClass` CRD's `.spec.metrics.enable` field now enables serving the
stable user metrics documented at https://tailscale.com/kb/1482/client-metrics
at `/metrics` on the same "metrics" container port that debug metrics were
previously served on. To smooth the transition for anyone relying on the way the
operator previously consumed this field, we also _temporarily_ serve tailscaled's
internal debug metrics on the same `/debug/metrics` path as before, until 1.82.0
when debug metrics will be turned off by default even if `.spec.metrics.enable`
is set. At that point, anyone who wishes to continue using the internal debug
metrics (not recommended) will need to set the new `ProxyClass` field
`.spec.statefulSet.pod.tailscaleContainer.debug.enable`.

Users who wish to opt out of the transitional behaviour, where enabling
`.spec.metrics.enable` also enables debug metrics, can set
`.spec.statefulSet.pod.tailscaleContainer.debug.enable` to false (recommended).

Separately but related, the operator will no longer specify a host port for the
"metrics" container port definition. This caused scheduling conflicts when k8s
needs to schedule more than one proxy per node, and was not necessary for allowing
the pod's port to be exposed to prometheus scrapers.

Updates #11292

---------

Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2 weeks ago
Irbe Krumina c59ab6baac
cmd/k8s-operator/deploy: ensure that operator can write kube state Events (#14177)
A small follow-up to #14112- ensures that the operator itself can emit
Events for its kube state store changes.

Updates tailscale/tailscale#14080

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 weeks ago
Andrea Gottardo e3c6ca43d3
cli: present risk warning when setting up app connector on macOS (#14181) 2 weeks ago
Brad Fitzpatrick 0c8c7c0f90 net/tsaddr: include test input in test failure output
https://go.dev/wiki/CodeReviewComments#useful-test-failures

(Previously it was using subtests with names including the input, but
 once those went away, there was no context left)

Updates #14169

Change-Id: Ib217028183a3d001fe4aee58f2edb746b7b3aa88
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
Andrew Dunham af4c3a4a1b cmd/tailscale/cli: create netmon in debug ts2021
Otherwise we'll see a panic if we hit the dnsfallback code and try to
call NewDialer with a nil NetMon.

Updates #14161

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I81c6e72376599b341cb58c37134c2a948b97cf5f
2 weeks ago
Brad Fitzpatrick 70d1241ca6 util/fastuuid: delete unused package
Its sole user was deleted in 02cafbe1ca.

And it has no public users: https://pkg.go.dev/tailscale.com/util/fastuuid?tab=importedby

And nothing in other Tailsale repos that I can find.

Updates tailscale/corp#24721

Change-Id: I8755770a255a91c6c99f596e6d10c303b3ddf213
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
Brad Fitzpatrick 02cafbe1ca tsweb: change RequestID format to have a date in it
So we can locate them in logs more easily.

Updates tailscale/corp#24721

Change-Id: Ia766c75608050dde7edc99835979a6e9bb328df2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
James Scott ebaf33a80c
net/tsaddr: extract IsTailscaleIPv4 from IsTailscaleIP (#14169)
Extracts tsaddr.IsTailscaleIPv4 out of tsaddr.IsTailscaleIP.

This will allow for checking valid Tailscale assigned IPv4 addresses
without checking IPv6 addresses.

Updates #14168
Updates tailscale/corp#24620

Signed-off-by: James Scott <jim@tailscale.com>
2 weeks ago
Irbe Krumina ebeb5da202
cmd/k8s-operator,kube/kubeclient,docs/k8s: update rbac to emit events + small fixes (#14164)
This is a follow-up to #14112 where our internal kube client was updated
to allow it to emit Events - this updates our sample kube manifests
and tsrecorder manifest templates so they can benefit from this functionality.

Updates tailscale/tailscale#14080

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 weeks ago
James Stocker 303a4a1dfb
Make the deployment of an IngressClass optional, default to true (#14153)
Fixes tailscale/tailscale#14152
Signed-off-by: James Stocker jamesrstocker@gmail.com

Co-authored-by: James Stocker <james.stocker@intenthq.co.uk>
2 weeks ago
Anton Tolchanov 9f33aeb649 wgengine/filter: actually use the passed CapTestFunc [capver 109]
Initial support for SrcCaps was added in 5ec01bf but it was not actually
working without this.

Updates #12542

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2 weeks ago
Aaron Klotz 48343ee673 util/winutil/s4u: fix token handle leak
Fixes #14156

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
2 weeks ago
Brad Fitzpatrick 810da91a9e version: fix earlier test/wording mistakes
Updates #14069

Change-Id: I1d2fd8a8ab6591af11bfb83748b94342a8ac718f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
Brad Fitzpatrick d62baa45e6 version: validate Long format on Android builds
Updates #14069

Change-Id: I134a90db561dacc4b1c1c66ccadac135b5d64cf3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
License Updater bb3d0cae5f licenses: update license notices
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
2 weeks ago
Irbe Krumina 00517c8189
kube/{kubeapi,kubeclient},ipn/store/kubestore,cmd/{containerboot,k8s-operator}: emit kube store Events (#14112)
Adds functionality to kube client to emit Events.
Updates kube store to emit Events when tailscaled state has been loaded, updated or if any errors where
encountered during those operations.
This should help in cases where an error related to state loading/updating caused the Pod to crash in a loop-
unlike logs of the originally failed container instance, Events associated with the Pod will still be
accessible even after N restarts.

Updates tailscale/tailscale#14080

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 weeks ago
Brad Fitzpatrick da70a84a4b ipn/ipnlocal: fix build, remove another Notify.BackendLogID reference that crept in
I merged 5cae7c51bf (removing Notify.BackendLogID) and 93db503565
(adding another reference to Notify.BackendLogID) that didn't have merge
conflicts, but didn't compile together.

This removes the new reference, fixing the build.

Updates #14129

Change-Id: I9bb68efd977342ea8822e525d656817235039a66
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago