In this PR, we refactor the LocalBackend extension system, moving from direct callbacks to a more organized extension host model.
Specifically, we:
- Extract interface and callback types used by packages extending LocalBackend functionality into a new ipn/ipnext package.
- Define ipnext.Host as a new interface that bridges extensions with LocalBackend.
It enables extensions to register callbacks and interact with LocalBackend in a concurrency-safe, well-defined, and controlled way.
- Move existing callback registration and invocation code from ipnlocal.LocalBackend into a new type called ipnlocal.ExtensionHost,
implementing ipnext.Host.
- Improve docs for existing types and methods while adding docs for the new interfaces.
- Add test coverage for both the extracted and the new code.
- Remove ipn/desktop.SessionManager from tsd.System since ipn/desktop is now self-contained.
- Update existing extensions (e.g., ipn/auditlog and ipn/desktop) to use the new interfaces where appropriate.
We're not introducing new callback and hook types (e.g., for ipn.Prefs changes) just yet, nor are we enhancing current callbacks,
such as by improving conflict resolution when more than one extension tries to influence profile selection via a background profile resolver.
These further improvements will be submitted separately.
Updates #12614
Updates tailscale/corp#27645
Updates tailscale/corp#26435
Updates tailscale/corp#18342
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This flag is currently no-op and hidden. The flag does round trip
through the related pref. Subsequent commits will tie them to
net/udprelay.Server. There is no corresponding "tailscale up" flag,
enabling/disabling of the relay server will only be supported via
"tailscale set".
This is a string flag in order to support disablement via empty string
as a port value of 0 means "enable the server and listen on a random
unused port". Disablement via empty string also follows existing flag
convention, e.g. advertise-routes.
Early internal discussions settled on "tailscale set --relay="<port>",
but the author felt this was too ambiguous around client vs server, and
may cause confusion in the future if we add related flags.
Updates tailscale/corp#27502
Signed-off-by: Jordan Whited <jordan@tailscale.com>
`tailscale set` was created to set preferences, which used to be
overloaded into `tailscale up`. To move people over to the new
command, `up` was supposed to be frozen and no new preference flags
would be added. But people forgot, there was no test to warn them, and
so new flags were added anyway.
TestUpFlagSetIsFrozen complains when new flags are added to
`tailscale up`. It doesn’t try all combinations of GOOS, but since
the CI builds in every OS, the pull-request tests should cover this.
Updates #15460
Signed-off-by: Simon Law <sfllaw@sfllaw.ca>
Ensure no services are advertised as part of shutting down tailscaled.
Prefs are only edited if services are currently advertised, and they're
edited we wait for control's ~15s (+ buffer) delay to failover.
Note that editing prefs will trigger a synchronous write to the state
Secret, so it may fail to persist state if the ProxyGroup is getting
scaled down and therefore has its RBAC deleted at the same time, but that
failure doesn't stop prefs being updated within the local backend,
doesn't affect connectivity to control, and the state Secret is
about to get deleted anyway, so the only negative side effect is a harmless
error log during shutdown. Control still learns that the node is no
longer advertising the service and triggers the failover.
Note that the first version of this used a PreStop lifecycle hook, but
that only supports GET methods and we need the shutdown to trigger side
effects (updating prefs) so it didn't seem appropriate to expose that
functionality on a GET endpoint that's accessible on the k8s network.
Updates tailscale/corp#24795
Change-Id: I0a9a4fe7a5395ca76135ceead05cbc3ee32b3d3c
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
As IPv4 and IPv6 end up with different MSS and different congestion
control strategies, proxying between them can really amplify TCP
meltdown style conditions in many real world network conditions, such as
with higher latency, some loss, etc.
Attempt to match up the protocols, otherwise pick a destination address
arbitrarily. Also shuffle the target address to spread load across
upstream load balancers.
Updates #15367
Signed-off-by: James Tucker <james@tailscale.com>
The test suite had grown to about 20s on my machine, but it doesn't
do much taxing work so was a good candidate to parallelise. Now runs
in under 2s on my machine.
Updates #cleanup
Change-Id: I2fcc6be9ca226c74c0cb6c906778846e959492e4
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
The earlier #15534 prevent some dup string flags. This does it for all
flag types.
Updates #6813
Change-Id: Iec2871448394ea9a5b604310bdbf7b499434bf01
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Some CLI flags support multiple values separated by commas. These flags
are intended to be declared only once and will silently ignore subsequent
instances. This will now throw an error if multiple instances of advertise-tags
and advertise-routes are detected.
Fixes#6813
Signed-off-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Ensure that the upstream is always queried, so that if upstream is going
to NXDOMAIN natc will also return NXDOMAIN rather than returning address
allocations.
At this time both IPv4 and IPv6 are still returned if upstream has a
result, regardless of upstream support - this is ~ok as we're proxying.
Rewrite the tests to be once again slightly closer to integration tests,
but they're still very rough and in need of a refactor.
Further refactors are probably needed implementation side too, as this
removed rather than added units.
Updates #15367
Signed-off-by: James Tucker <james@tailscale.com>
This adds netx.DialFunc, unifying a type we have a bazillion other
places, giving it now a nice short name that's clickable in
editors, etc.
That highlighted that my earlier move (03b47a55c7) of stuff from
nettest into netx moved too much: it also dragged along the memnet
impl, meaning all users of netx.DialFunc who just wanted netx for the
type definition were instead also pulling in all of memnet.
So move the memnet implementation netx.Network into memnet, a package
we already had.
Then use netx.DialFunc in a bunch of places. I'm sure I missed some.
And plenty remain in other repos, to be updated later.
Updates tailscale/corp#27636
Change-Id: I7296cd4591218e8624e214f8c70dab05fb884e95
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I added yet another one in 6d117d64a2 but that new one is at the
best place int he dependency graph and has the best name, so let's use
that one for everything possible.
types/lazy can't use it for circular dependency reasons, so unexport
that copy at least.
Updates #cleanup
Change-Id: I25db6b6a0d81dbb8e89a0a9080c7f15cbf7aa770
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Make the perPeerState objects able to function independently without a
shared reference to the connector.
We don't currently change the values from connector that perPeerState
uses at runtime. Explicitly copying them at perPeerState creation allows
us to, for example, put the perPeerState into a consensus algorithm in
the future.
Updates #14667
Signed-off-by: Fran Bull <fran@tailscale.com>
So we can link tailscale and tailscaled together into one.
Updates #5794
Change-Id: I9a8b793c64033827e4188931546cbd64db55982e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Avoid the unbounded runtime during random allocation, if random
allocation fails after a first pass at random through the provided
ranges, pick the next free address by walking through the allocated set.
The new ipx utilities provide a bitset based allocation pool, good for
small to moderate ranges of IPv4 addresses as used in natc.
Updates #15367
Signed-off-by: James Tucker <james@tailscale.com>
For hooking up websocket VM clients to natlab.
Updates #13038
Change-Id: Iaf728b9146042f3d0c2d3a5e25f178646dd10951
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Re-enable HA Ingress again that was disabled for 1.82 release.
This reverts commit fea74a60d5.
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Not all platforms have hardlinks, or not easily.
This lets a "tailscale" wrapper script set an environment variable
before calling tailscaled.
Updates #2233
Change-Id: I9eccc18651e56c106f336fcbbd0fd97a661d312e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In this PR, we update ipnlocal.LocalBackend to allow registering callbacks for control client creation
and profile changes. We also allow to register ipnauth.AuditLogFunc to be called when an auditable
action is attempted.
We then use all this to invert the dependency between the auditlog and ipnlocal packages and make
the auditlog functionality optional, where it only registers its callbacks via ipnlocal-provided hooks
when the auditlog package is imported.
We then underscore-import it when building tailscaled for Windows, and we'll explicitly
import it when building xcode/ipn-go-bridge for macOS. Since there's no default log-store
location for macOS, we'll also need to call auditlog.SetStoreFilePath to specify where
pending audit logs should be persisted.
Fixes#15394
Updates tailscale/corp#26435
Updates tailscale/corp#27012
Signed-off-by: Nick Khyl <nickk@tailscale.com>
The default values for `tailscale up` and `tailscale set` are supposed
to agree for all common flags. But they don’t for `--accept-routes`
on Windows and from the Mac OS App Store, because `tailscale up`
computes this value based on the operating system:
user@host:~$ tailscale up --help 2>&1 | grep -A1 accept-routes
--accept-dns, --accept-dns=false
accept DNS configuration from the admin panel (default true)
user@host:~$ tailscale set --help 2>&1 | grep -A1 accept-routes
--accept-dns, --accept-dns=false
accept DNS configuration from the admin panel
Luckily, `tailscale set` uses `ipn.MaskedPrefs`, so the default values
don’t logically matter. But someone will get the wrong idea if they
trust the `tailscale set --help` documentation.
In addition, `ipn.Prefs.RouteAll` defaults to true so it disagrees
with both of the flags above.
This patch makes `--accept-routes` use the same logic for in both
commands by hoisting the logic that was buried in `cmd/tailscale/cli`
to `ipn.Prefs.DefaultRouteAll`. Then, all three of defaults can agree.
Fixes: #15319
Signed-off-by: Simon Law <sfllaw@sfllaw.ca>
The default values for `tailscale up` and `tailscale set` are supposed
to agree on all common flags. But they don’t for `--accept-dns`:
user@host:~$ tailscale up --help 2>&1 | grep -A1 accept-dns
--accept-dns, --accept-dns=false
accept DNS configuration from the admin panel (default true)
user@host:~$ tailscale set --help 2>&1 | grep -A1 accept-dns
--accept-dns, --accept-dns=false
accept DNS configuration from the admin panel
Luckily, `tailscale set` uses `ipn.MaskedPrefs`, so the default values
don’t logically matter. But someone will get the wrong idea if they
trust the `tailscale set --help` documentation.
This patch makes `--accept-dns` default to true in both commands and
also introduces `TestSetDefaultsMatchUpDefaults` to prevent any future
drift.
Fixes: #15319
Signed-off-by: Simon Law <sfllaw@sfllaw.ca>
Temporarily make sure that the HA Ingress reconciler does not run,
as we do not want to release this to stable just yet.
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
This is a very dumb fix as it has an unbounded worst case runtime. IP
allocation needs to be done in a more sane way in a follow-up.
Updates #15367
Signed-off-by: James Tucker <james@tailscale.com>
cmd/{k8s-operator,containerboot}: check TLS cert before advertising VIPService
- Ensures that Ingress status does not advertise port 443 before
TLS cert has been issued
- Ensure that Ingress backends do not advertise a VIPService
before TLS cert has been issued, unless the service also
exposes port 80
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
These tests aren't perfect, nor is this complete coverage, but this is a
set of coverage that is at least stable.
Updates #15367
Signed-off-by: James Tucker <james@tailscale.com>
Switch from using the Comment field to a ts-scoped annotation for
tracking which operators are cooperating over ownership of a
VIPService.
Updates tailscale/corp#24795
Change-Id: I72d4a48685f85c0329aa068dc01a1a3c749017bf
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
cmd/k8s-operator,k8s-operator: allow using LE staging endpoint for Ingress
Allow to optionally use LetsEncrypt staging endpoint to issue
certs for Ingress/HA Ingress, so that it is easier to
experiment with initial Ingress setup without hiting rate limits.
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
This adds a portable way to do a raw LocalAPI request without worrying
about the Unix-vs-macOS-vs-Windows ways of hitting the LocalAPI server.
(It was already possible but tedious with 'tailscale debug local-creds')
Updates tailscale/corp#24690
Change-Id: I0828ca55edaedf0565c8db192c10f24bebb95f1b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
There was a flaky failure case where renaming a TLS hostname for an
ingress might leave the old hostname dangling in tailscaled config. This
happened when the proxygroup reconciler loop had an outdated resource
version of the config Secret in its cache after the
ingress-pg-reconciler loop had very recently written it to delete the
old hostname. As the proxygroup reconciler then did a patch, there was
no conflict and it reinstated the old hostname.
This commit updates the patch to an update operation so that if the
resource version is out of date it will fail with an optimistic lock
error. It also checks for equality to reduce the likelihood that we make
the update API call in the first place, because most of the time the
proxygroup reconciler is not even making an update to the Secret in the
case that the hostname has changed.
Updates tailscale/corp#24795
Change-Id: Ie23a97440063976c9a8475d24ab18253e1f89050
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
cmd/k8s-operator: configure HA Ingress replicas to share certs
Creates TLS certs Secret and RBAC that allows HA Ingress replicas
to read/write to the Secret.
Configures HA Ingress replicas to run in read-only mode.
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Update the HA Ingress controller to wait until it sees AdvertisedServices
config propagated into at least 1 Pod's prefs before it updates the status
on the Ingress, to ensure the ProxyGroup Pods are ready to serve traffic
before indicating that the Ingress is ready
Updates tailscale/corp#24795
Change-Id: I1b8ce23c9e312d08f9d02e48d70bdebd9e1a4757
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Allows the use of tsweb without pulling in all of the heavy prometheus
client libraries, protobuf and so on.
Updates #15160
Signed-off-by: David Anderson <dave@tailscale.com>
This PR adds some custom logic for reading and writing
kube store values that are TLS certs and keys:
1) when store is initialized, lookup additional
TLS Secrets for this node and if found, load TLS certs
from there
2) if the node runs in certs 'read only' mode and
TLS cert and key are not found in the in-memory store,
look those up in a Secret
3) if the node runs in certs 'read only' mode, run
a daily TLS certs reload to memory to get any
renewed certs
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
When the Ingress is updated to a new hostname, the controller does not
currently clean up the old VIPService from control. Fix this up to parse
the ownership comment correctly and write a test to enforce the improved
behaviour
Updates tailscale/corp#24795
Change-Id: I792ae7684807d254bf2d3cc7aa54aa04a582d1f5
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This adds support for using ACL Grants to configure a role for the
auto-provisioned user.
Fixestailscale/corp#14567
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
cmd/containerboot: manage HA Ingress TLS certs from containerboot
When ran as HA Ingress node, containerboot now can determine
whether it should manage TLS certs for the HA Ingress replicas
and call the LocalAPI cert endpoint to ensure initial issuance
and renewal of the shared TLS certs.
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
updates tailscale/corp#26435
Adds client support for sending audit logs to control via /machine/audit-log.
Specifically implements audit logging for user initiated disconnections.
This will require further work to optimize the peristant storage and exclusion
via build tags for mobile:
tailscale/corp#27011tailscale/corp#27012
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
Ensure that the src address for a connection is one of the primary
addresses assigned by Tailscale. Not, for example, a virtual IP address.
Updates #14667
Signed-off-by: Fran Bull <fran@tailscale.com>
Add support for Cross-Origin XHR requests to the openid-configuration
endpoint to enable clients like Grafana's auto-population of OIDC setup
data from its contents.
Updates https://github.com/tailscale/tailscale/issues/10263
Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
natc itself can't immediately fix the problem, but it can more correctly
error that return bad addresses.
Updates tailscale/corp#26968
Signed-off-by: James Tucker <james@tailscale.com>
For people who can't use LetsEncrypt because it's banned.
Per https://github.com/tailscale/tailscale/issues/11776#issuecomment-2520955317
This does two things:
1) if you run derper with --certmode=manual and --hostname=$IP_ADDRESS
we previously permitted, but now we also:
* auto-generate the self-signed cert for you if it doesn't yet exist on disk
* print out the derpmap configuration you need to use that
self-signed cert
2) teaches derp/derphttp's derp dialer to verify the signature of
self-signed TLS certs, if so declared in the existing
DERPNode.CertName field, which previously existed for domain fronting,
separating out the dial hostname from how certs are validates,
so it's not overloaded much; that's what it was meant for.
Fixes#11776
Change-Id: Ie72d12f209416bb7e8325fe0838cd2c66342c5cf
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
cmd/k8s-operator: ensure HA Ingress can operate in multicluster mode.
Update the owner reference mechanism so that:
- if during HA Ingress resource creation, a VIPService
with some other operator's owner reference is already found,
just update the owner references to add one for this operator
- if during HA Ingress deletion, the VIPService is found to have owner
reference(s) from another operator, don't delete the VIPService, just
remove this operator's owner reference
- requeue after HA Ingress reconciles that resulted in VIPService updates,
to guard against overwrites due to concurrent operations from different
clusters.
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Use secure constant time comparisons for the client ID and secret values
during the allowRelyingParty authorization check.
Updates #cleanup
Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
Now that packets flow for VIPServices, the last piece needed to start
serving them from a ProxyGroup is config to tell the proxy Pods which
services they should advertise.
Updates tailscale/corp#24795
Change-Id: Ic7bbeac8e93c9503558107bc5f6123be02a84c77
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
We are soon going to start assigning shared-in nodes a CGNAT IPv4 in the Hello tailnet when necessary, the same way that normal node shares assign a new IPv4 on conflict.
But Hello wants to display the node's native IPv4, the one it uses in its own tailnet. That IPv4 isn't available anywhere in the netmap today, because it's not normally needed for anything.
We are going to start sending that native IPv4 in the peer node CapMap, only for Hello's netmap responses. This change enables Hello to display that native IPv4 instead, when available.
Updates tailscale/corp#25393
Change-Id: I87480b6d318ab028b41ef149eb3ba618bd7f1e08
Signed-off-by: Brian Palmer <brianp@tailscale.com>
* go.toolchain.branch: update to Go 1.24
Updates #15015
Change-Id: I29c934ec17e60c3ac3264f30fbbe68fc21422f4d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* cmd/testwrapper: fix for go1.24
Updates #15015
Signed-off-by: Paul Scott <paul@tailscale.com>
* go.mod,Dockerfile: bump to Go 1.24
Also bump golangci-lint to a version that was built with 1.24
Updates #15015
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
---------
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Paul Scott <paul@tailscale.com>
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Co-authored-by: Paul Scott <paul@tailscale.com>
Co-authored-by: Andrew Lytvynov <awly@tailscale.com>
There's nothing about it on
https://github.com/multipath-tcp/mptcp_net-next/issues/ but empirically
MPTCP doesn't support this option on awly's kernel 6.13.2 and in GitHub
actions.
Updates #15015
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
This will help debug unexpected issues encountered by consumers of the gitops-pusher.
Updates tailscale/corp#26664
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Even after we remove the deprecated API, we will want to maintain a minimal
API for internal use, in order to avoid importing the external
tailscale.com/client/tailscale/v2 package. This shim exposes only the necessary
parts of the deprecated API for internal use, which gains us the following:
1. It removes deprecation warnings for internal use of the API.
2. It gives us an inventory of which parts we will want to keep for internal use.
Updates tailscale/corp#22748
Signed-off-by: Percy Wegmann <percy@tailscale.com>
testwrapper doesn't work with Go 1.24 and the coverage support is
making it harder to debug.
Updates #15015
Updates tailscale/corp#26659
Change-Id: I0125e881d08c92f1ecef88b57344f6bbb571b569
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In this PR, we enable the registration of LocalBackend extensions to exclude code specific to certain
platforms or environments. We then introduce desktopSessionsExt, which is included only in Windows builds
and only if the ts_omit_desktop_sessions tag is disabled for the build. This extension tracks desktop sessions
and switches to (or remains on) the appropriate profile when a user signs in or out, locks their screen,
or disconnects a remote session.
As desktopSessionsExt requires an ipn/desktop.SessionManager, we register it with tsd.System
for the tailscaled subprocess on Windows.
We also fix a bug in the sessionWatcher implementation where it attempts to close a nil channel on stop.
Updates #14823
Updates tailscale/corp#26247
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Introduce new TaildropTargetStatus in PeerStatus
Refactor getTargetStableID to solely rely on Status() instead of calling FileTargets(). This removes a possible race condition between the two calls and provides more detailed failure information if a peer can't receive files.
Updates tailscale/tailscale#14393
Signed-off-by: kari-ts <kari@tailscale.com>
Bart has had some substantial improvements in internal representation,
update functions, and other optimizations to reduce memory usage and
improve runtime performance.
Updates tailscale/corp#26353
Signed-off-by: James Tucker <james@tailscale.com>
These tunings reduced memory usage while the implementation was
struggling with earlier bugs, but will no longer be necessary after
those bugs are addressed.
Depends #14933
Depends #14934
Updates #9707
Updates #10408
Updates tailscale/corp#24483
Updates tailscale/corp#25169
Signed-off-by: James Tucker <james@tailscale.com>
Cubic performs better than Reno in higher BDP scenarios, and enables the
use of the hystart++ implementation contributed by Coder. This improves
throughput on higher BDP links with a much faster ramp.
gVisor is bumped as well for some fixes related to send queue processing
and RTT tracking.
Updates #9707
Updates #10408
Updates #12393
Updates tailscale/corp#24483
Updates tailscale/corp#25169
Signed-off-by: James Tucker <james@tailscale.com>
Incorrect disabled support for not having a mesh key in
d5316a4fbb
Allow for no mesh key to be set.
Fixes#14928
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Since dynamic reload of setec is not supported
in derper at this time, close the server after
the secret is loaded.
Updates tailscale/corp#25756
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
It was moved in f57fa3cbc3.
Updates tailscale/corp#22748
Change-Id: I19f965e6bded1d4c919310aa5b864f2de0cd6220
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A previous PR accidentally logged the key as part
of an error. Remove logging of the key.
Add log print for Setec store steup.
Updates tailscale/corp#25756
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Add setec secret support for derper.
Support dev mode via env var, and setec via secrets URL.
For backwards compatibility use setec load from file also.
Updates tailscale/corp#25756
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
This change:
- reinstates the HA Ingress controller that was disabled for 1.80 release
- fixes the API calls to manage VIPServices as the API was changed
- triggers the HA Ingress reconciler on ProxyGroup changes
Updates tailscale/tailscale#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Many places that need to work with node/peer capabilities end up with a
something-View and need to either reimplement the helper code or make an
expensive copy. We have the machinery to easily handle this now.
Updates #cleanup
Change-Id: Ic3f55be329f0fc6c178de26b34359d0e8c6ca5fc
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
The upstream crypto package now supports sending banners at any time during
authentication, so the Tailscale fork of crypto/ssh is no longer necessary.
github.com/tailscale/golang-x-crypto is still needed for some custom ACME
autocert functionality.
tempfork/gliderlabs is still necessary because of a few other customizations,
mostly related to TTY handling.
Originally implemented in 46fd4e58a2,
which was reverted in b60f6b849a to
keep the change out of v1.80.
Updates #8593
Signed-off-by: Percy Wegmann <percy@tailscale.com>
tailscaled's ipn package writes a collection of keys to state after
authenticating to control, but one at a time. If containerboot happens
to send a SIGTERM signal to tailscaled in the middle of writing those
keys, it may shut down with an inconsistent state Secret and never
recover. While we can't durably fix this with our current single-use
auth keys (no atomic operation to auth + write state), we can reduce
the window for this race condition by checking for partial state
before sending SIGTERM to tailscaled. Best effort only.
Updates #14080
Change-Id: I0532d51b6f0b7d391e538468bd6a0a80dbe1d9f7
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
The HA Ingress functionality is not actually doing anything
valuable yet, so don't run the controller in 1.80 release yet.
Updates tailscale/tailscale#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
This change builds on top of #14436 to ensure minimum downtime during egress ProxyGroup update rollouts:
- adds a readiness gate for ProxyGroup replicas that prevents kubelet from marking
the replica Pod as ready before a corresponding readiness condition has been added
to the Pod
- adds a reconciler that reconciles egress ProxyGroup Pods and, for each that is not ready,
if cluster traffic for relevant egress endpoints is routed via this Pod- if so add the
readiness condition to allow kubelet to mark the Pod as ready.
During the sequenced StatefulSet update rollouts kubelet does not restart
a Pod before the previous replica has been updated and marked as ready, so
ensuring that a replica is not marked as ready allows to avoid a temporary
post-update situation where all replicas have been restarted, but none of the
new ones are yet set up as an endpoint for the egress service, so cluster traffic is dropped.
Updates tailscale/tailscale#14326
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
This reverts commit 46fd4e58a2.
We don't want to include this in 1.80 yet, but can add it back post 1.80.
Updates #8593
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Fixes the configfile reload logic- if the tailscale capver can not
yet be determined because the device info is not yet written to the
state Secret, don't assume that the proxy is pre-110.
Updates tailscale/tailscale#13032
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
cmd/{containerboot,k8s-operator},kube: add preshutdown hook for egress PG proxies
This change is part of work towards minimizing downtime during update
rollouts of egress ProxyGroup replicas.
This change:
- updates the containerboot health check logic to return Pod IP in headers,
if set
- always runs the health check for egress PG proxies
- updates ClusterIP Services created for PG egress endpoints to include
the health check endpoint
- implements preshutdown endpoint in proxies. The preshutdown endpoint
logic waits till, for all currently configured egress services, the ClusterIP
Service health check endpoint is no longer returned by the shutting-down Pod
(by looking at the new Pod IP header).
- ensures that kubelet is configured to call the preshutdown endpoint
This reduces the possibility that, as replicas are terminated during an update,
a replica gets terminated to which cluster traffic is still being routed via
the ClusterIP Service because kube proxy has not yet updated routig rules.
This is not a perfect check as in practice, it only checks that the kube
proxy on the node on which the proxy runs has updated rules. However, overall
this might be good enough.
The preshutdown logic is disabled if users have configured a custom health check
port via TS_LOCAL_ADDR_PORT env var. This change throws a warnign if so and in
future setting of that env var for operator proxies might be disallowed (as users
shouldn't need to configure this for a Pod directly).
This is backwards compatible with earlier proxy versions.
Updates tailscale/tailscale#14326
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The upstream crypto package now supports sending banners at any time during
authentication, so the Tailscale fork of crypto/ssh is no longer necessary.
github.com/tailscale/golang-x-crypto is still needed for some custom ACME
autocert functionality.
tempfork/gliderlabs is still necessary because of a few other customizations,
mostly related to TTY handling.
Updates #8593
Signed-off-by: Percy Wegmann <percy@tailscale.com>
We've been maintaining temporary dev forks of golang.org/x/crypto/{acme,ssh}
in https://github.com/tailscale/golang-x-crypto instead of using
this repo's tempfork directory as we do with other packages. The reason we were
doing that was because x/crypto/ssh depended on x/crypto/ssh/internal/poly1305
and I hadn't noticed there are forwarding wrappers already available
in x/crypto/poly1305. It also depended internal/bcrypt_pbkdf but we don't use that
so it's easy to just delete that calling code in our tempfork/ssh.
Now that our SSH changes have been upstreamed, we can soon unfork from SSH.
That leaves ACME remaining.
This change copies our tailscale/golang-x-crypto/acme code to
tempfork/acme but adds a test that our vendored copied still matches
our tailscale/golang-x-crypto repo, where we can continue to do
development work and rebases with upstream. A comment on the new test
describes the expected workflow.
While we could continue to just import & use
tailscale/golang-x-crypto/acme, it seems a bit nicer to not have that
entire-fork-of-x-crypto visible at all in our transitive deps and the
questions that invites. Showing just a fork of an ACME client is much
less scary. It does add a step to the process of hacking on the ACME
client code, but we do that approximately never anyway, and the extra
step is very incremental compared to the existing tedious steps.
Updates #8593
Updates #10238
Change-Id: I8af4378c04c1f82e63d31bf4d16dba9f510f9199
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The c2n handling code was using the Go httptest package's
ResponseRecorder code but that's in a test package which brings in
Go's test certs, etc.
This forks the httptest recorder type into its own package that only
has the recorder and adds a test that we don't re-introduce a
dependency on httptest.
Updates #12614
Change-Id: I3546f49972981e21813ece9064cc2be0b74f4b16
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The hiding of internal packages has hidden things I wanted to see a
few times now. Stop hiding them. This makes depaware.txt output a bit
longer, but not too much. Plus we only really look at it with diffs &
greps anyway; it's not like anybody reads the whole thing.
Updates #12614
Change-Id: I868c89eeeddcaaab63e82371651003629bc9bda8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We had the debug packet capture code + Lua dissector in the CLI + the
iOS app. Now we don't, with tests to lock it in.
As a bonus, tailscale.com/net/packet and tailscale.com/net/flowtrack
no longer appear in the CLI's binary either.
A new build tag ts_omit_capture disables the packet capture code and
was added to build_dist.sh's --extra-small mode.
Updates #12614
Change-Id: I79b0628c0d59911bd4d510c732284d97b0160f10
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Some natc instances have been observed with excessive memory growth,
dominant in gvisor buffers. It is likely that the connection buffers are
sticking around for too long due to the default long segment time, and
uptuned buffer size applied by default in wgengine/netstack. Apply
configurations in natc specifically which are a better match for the
natc use case, most notably a 5s maximum segment lifetime.
Updates tailscale/corp#25169
Signed-off-by: James Tucker <james@tailscale.com>
The timeout still defaults to 2 seconds, but can now be changed via command-line flag.
Updates tailscale/corp#26045
Signed-off-by: Percy Wegmann <percy@tailscale.com>
3dabea0fc2 added some docs with inconsistent usage docs.
This fixes them, and adds a test.
It also adds some other tests and fixes other verb tense
inconsistencies.
Updates tailscale/corp#25278
Change-Id: I94c2a8940791bddd7c35c1c3d5fb791a317370c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In v1.78, we started acquiring the GP lock when reading policy settings. This led to a deadlock during
Tailscale installation via Group Policy Software Installation because the GP engine holds the write lock
for the duration of policy processing, which in turn waits for the installation to complete, which in turn
waits for the service to enter the running state.
In this PR, we prevent the acquisition of GP locks (aka EnterCriticalPolicySection) during service startup
and update the Windows Registry-based util/syspolicy/source.PlatformPolicyStore to handle this failure
gracefully. The GP lock is somewhat optional; it’s safe to read policy settings without it, but acquiring
the lock is recommended when reading multiple values to prevent the Group Policy engine from modifying
settings mid-read and to avoid inconsistent results.
Fixes#14416
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Still behind the same ts_omit_tap build tag.
See #14738 for background on the pattern.
Updates #12614
Change-Id: I03fb3d2bf137111e727415bd8e713d8568156ecc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The new ProxyGroup-based Ingress reconciler is causing a fatal log at
startup because it has the same name as the existing Ingress reconciler.
Explicitly name both to ensure they have unique names that are consistent
with other explicitly named reconcilers.
Updates #14583
Change-Id: Ie76e3eaf3a96b1cec3d3615ea254a847447372ea
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This pulls out the Wake-on-LAN (WoL) code out into its own package
(feature/wakeonlan) that registers itself with various new hooks
around tailscaled.
Then a new build tag (ts_omit_wakeonlan) causes the package to not
even be linked in the binary.
Ohter new packages include:
* feature: to just record which features are loaded. Future:
dependencies between features.
* feature/condregister: the package with all the build tags
that tailscaled, tsnet, and the Tailscale Xcode project
extension can empty (underscore) import to load features
as a function of the defined build tags.
Future commits will move of our "ts_omit_foo" build tags into this
style.
Updates #12614
Change-Id: I9c5378dafb1113b62b816aabef02714db3fc9c4a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates tailscale/corp#25278
Adds definitions for new CLI commands getting added in v1.80. Refactors some pre-existing CLI commands within the `configure` tree to clean up code.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
Rather than using a string everywhere and needing to clarify that the
string should have the svc: prefix, create a separate type for Service
names.
Updates tailscale/corp#24607
Change-Id: I720e022f61a7221644bb60955b72cacf42f59960
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
We previously baked in the LetsEncrypt x509 root CA for our tlsdial
package.
This moves that out into a new "bakedroots" package and is now also
shared by ipn/ipnlocal's cert validation code (validCertPEM) that
decides whether it's time to fetch a new cert.
Otherwise, a machine without LetsEncrypt roots locally in its system
roots is unable to use tailscale cert/serve and fetch certs.
Fixes#14690
Change-Id: Ic88b3bdaabe25d56b9ff07ada56a27e3f11d7159
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Both @agottardo and I tripped over this today.
Updates #cleanup
Change-Id: I64380a03bfc952b9887b1512dbcadf26499ff1cd
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
cmd/k8s-operator: add logic to parse L7 Ingresses in HA mode
- Wrap the Tailscale API client used by the Kubernetes Operator
into a client that knows how to manage VIPServices.
- Create/Delete VIPServices and update serve config for L7 Ingresses
for ProxyGroup.
- Ensure that ingress ProxyGroup proxies mount serve config from a shared ConfigMap.
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Most users should not run into this because it's set in the helm chart
and the deploy manifest, but if namespace is not set we get confusing
authz errors because the kube client tries to fetch some namespaced resources
as though they're cluster-scoped and reports permission denied. Try to
detect namespace from the default projected volume, and otherwise fatal.
Fixes #cleanup
Change-Id: I64b34191e440b61204b9ad30bbfa117abbbe09c3
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
As we look to add github.com/prometheus/client_golang/prometheus to
more parts of the codebase, lock in that we don't use it in tailscaled,
primarily for binary size reasons.
Updates #12614
Change-Id: I03c100d12a05019a22bdc23ce5c4df63d5a03ec6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I moved the actual rename into separate, GOOS-specific files. On
non-Windows, we do a simple os.Rename. On Windows, we first try
ReplaceFile with a fallback to os.Rename if the target file does
not exist.
ReplaceFile is the recommended way to rename the file in this use case,
as it preserves attributes and ACLs set on the target file.
Updates #14428
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
The --mesh-with flag now supports the specification of hostname tuples like
derp1a.tailscale.com/derp1a-vpc.tailscale.com, which instructs derp to mesh
with host 'derp1a.tailscale.com' but dial TCP connections to 'derp1a-vpc.tailscale.com'.
For backwards compatibility, --mesh-with still supports individual hostnames.
The logic which attempts to auto-discover '[host]-vpc.tailscale.com' dial hosts
has been removed.
Updates tailscale/corp#25653
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This finishes the work started in #14616.
Updates #8632
Change-Id: I4dc07d45b1e00c3db32217c03b21b8b1ec19e782
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In this PR, we add a generic views.ValuePointer type that can be used as a view for pointers
to basic types and struct types that do not require deep cloning and do not have corresponding
view types. Its Get/GetOk methods return stack-allocated shallow copies of the underlying value.
We then update the cmd/viewer codegen to produce getters that return either concrete views
when available or ValuePointer views when not, for pointer fields in generated view types.
This allows us to avoid unnecessary allocations compared to returning pointers to newly
allocated shallow copies.
Updates #14570
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This amends commit b7e48058c8.
That commit broke all documented ways of starting Tailscale on gokrazy:
https://gokrazy.org/packages/tailscale/ — both Option A (tailscale up)
and Option B (tailscale up --auth-key) rely on the tailscale CLI working.
I verified that the tailscale CLI just prints it help when started
without arguments, i.e. it does not stay running and is not restarted.
I verified that the tailscale CLI successfully exits when started with
tailscale up --auth-key, regardless of whether the node has joined
the tailnet yet or not.
I verified that the tailscale CLI successfully waits and exits when
started with tailscale up, as expected.
fixes https://github.com/gokrazy/gokrazy/issues/286
Signed-off-by: Michael Stapelberg <michael@stapelberg.de>
sync.OnceValue and slices.Compact were both added in Go 1.21.
cmp.Or was added in Go 1.22.
Updates #8632
Updates #11058
Change-Id: I89ba4c404f40188e1f8a9566c8aaa049be377754
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
cmd/containerboot: load containerboot serve config that does not contain HTTPS endpoint in tailnets with HTTPS disabled
Fixes an issue where, if a tailnet has HTTPS disabled, no serve config
set via TS_SERVE_CONFIG was loaded, even if it does not contain an HTTPS endpoint.
Now for tailnets with HTTPS disabled serve config provided to containerboot is considered invalid
(and therefore not loaded) only if there is an HTTPS endpoint defined in the config.
Fixestailscale/tailscale#14495
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
cmd/{k8s-operator,containerboot}: reload tailscaled configfile when its contents have changed
Instead of restarting the Kubernetes Operator proxies each time
tailscaled config has changed, this dynamically reloads the configfile
using the new reload endpoint.
Older annotation based mechanism will be supported till 1.84
to ensure that proxy versions prior to 1.80 keep working with
operator 1.80 and newer.
Updates tailscale/tailscale#13032
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
* cmd/k8s-operator,k8s-operator: allow users to set custom labels for the optional ServiceMonitor
Updates tailscale/tailscale#14381
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Currently this does not yet do anything apart from creating
the ProxyGroup resources like StatefulSet.
Updates tailscale/corp#24795
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
These erroneously blocked a recent PR, which I fixed by simply
re-running CI. But we might as well fix them anyway.
These are mostly `printf` to `print` and a couple of `!=` to `!Equal()`
Updates #cleanup
Signed-off-by: Will Norris <will@tailscale.com>
It was failing about an unaccepted risk ("mac-app-connector") because
it was checking runtime.GOOS ("darwin") instead of the test's env.goos
string value ("linux", which doesn't have the warning).
Fixes#14544
Change-Id: I470d86a6ad4bb18e1dd99d334538e56556147835
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Importing the ~deprecated golang.org/x/exp/maps as "xmaps" to not
shadow the std "maps" was getting ugly.
And using slices.Collect on an iterator is verbose & allocates more.
So copy (x)maps.Keys+Values into our slicesx package instead.
Updates #cleanup
Updates #12912
Updates #14514 (pulled out of that change)
Change-Id: I5e68d12729934de93cf4a9cd87c367645f86123a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
On Linux, systray.SetTitle actually seems to set the tooltip on all
desktops I've tested on. But on macOS, it actually does set a title
that is always displayed in the systray area next to the icon. This
change should properly set the tooltip across platforms.
Updates #1708
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
Move a number of global state vars into the Menu struct, keeping things
better encapsulated. The systray package still relies on its own global
state, so only a single Menu instance can run at a time.
Move a lot of the initialization logic out of onReady, in particular
fetching the latest tailscale state. Instead, populate the state before
calling systray.Run, which fixes a timing issue in GNOME (#14477).
This change also creates a separate bgContext for actions not tied menu
item clicks. Because we have to rebuild the entire menu regularly, we
cancel that context as needed, which can cancel subsequent updateState
calls.
Also exit cleanly on SIGINT and SIGTERM.
Updates #1708Fixes#14477
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
Refactor code to set app icon and title as part of rebuild, rather than
separately in eventLoop. This fixes several cases where they weren't
getting updated properly. This change also makes use of the new exit
node icons.
Updates #1708
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
restructure tsLogo to allow setting a mask to be used when drawing the
logo dots, as well as add an overlay icon, such as the arrow when
connected to an exit node.
The icon is still renders as white on black, but this change also
prepare for doing a black on white version, as well a fully transparent
icon. I don't know if we can consistently determine which to use, so
this just keeps the single icon for now.
Updates #1708
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
metrics.LabelMap grows slightly more heavy, needing a lock to ensure
proper ordering for newly initialized ShardedInt values. An Add method
enables callers to use .Add for both expvar.Int and syncs.ShardedInt
values, but retains the original behavior of defaulting to initializing
expvar.Int values.
Updates tailscale/corp#25450
Co-Authored-By: Andrew Dunham <andrew@du.nham.ca>
Signed-off-by: James Tucker <james@tailscale.com>
- rebuild menu when prefs change outside of systray, such as setting an
exit node
- refactor onClick handler code
- compare lowercase country name, the same as macOS and Windows (now
sorts Ukraine before USA)
- fix "connected / disconnected" menu items on stopped status
- prevent nil pointer on "This Device" menu item
Updates #1708
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
This commit builds the exit node menu including the recommended exit
node, if available, as well as tailnet and mullvad exit nodes.
This does not yet update the menu based on changes in exit node outside
of the systray app, which will come later. This also does not include
the ability to run as an exit node.
Updates #1708
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
The new menu delay added to fix libdbusmenu systrays causes problems
with KDE. Given the state of wildly varying systray implementations, I
suspect we may need more desktop-specific hacks, so I'm setting this up
to accommodate that.
Updates #1708
Updates #14431
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
Bring UI closer to macOS and windows:
- split login and tailnet name over separate lines
- render profile picture (with very simple caching)
- use checkbox to indicate active profile. I've not found any desktops
that can't render checkboxes, so I'd like to explore other options
if needed.
Updates #1708
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
Some notification managers crop the application icon to a circle, so
ensure we have enough padding to account for that.
Updates #1708
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
This new type of probe sends DERP packets sized similarly to CallMeMaybe packets
at a rate of 10 packets per second. It records the round-trip times in a Prometheus
histogram. It also keeps track of how many packets are dropped. Packets that fail to
arrive within 5 seconds are considered dropped.
Updates tailscale/corp#24522
Signed-off-by: Percy Wegmann <percy@tailscale.com>
The go-httpstat package has a data race when used with connections that
are performing happy-eyeballs connection setups as we are in the DERP
client. There is a long-stale PR upstream to address this, however
revisiting the purpose of this code suggests we don't really need
httpstat here.
The code populates a latency table that may be used to compare to STUN
latency, which is a lightweight RTT check. Switching out the reported
timing here to simply the request HTTP request RTT avoids the
problematic package.
Fixestailscale/corp#25095
Signed-off-by: James Tucker <james@tailscale.com>
This is the start of an integration/e2e test suite for the tailscale operator.
It currently only tests two major features, ingress proxy and API server proxy,
but we intend to expand it to cover more features over time. It also only
supports manual runs for now. We intend to integrate it into CI checks in a
separate update when we have planned how to securely provide CI with the secrets
required for connecting to a test tailnet.
Updates #12622
Change-Id: I31e464bb49719348b62a563790f2bc2ba165a11b
Co-authored-by: Irbe Krumina <irbe@tailscale.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
A method on kc was called unconditionally, even if was not initialized,
leading to a nil pointer dereference when TS_SERVE_CONFIG was set
outside Kubernetes.
Add a guard symmetric with other uses of the kubeClient.
Fixes#14354.
Signed-off-by: Bjorn Neergaard <bjorn@neersighted.com>
Make argparsing use flag for adding a new
parameter that requires parsing.
Enforce a read timeout deadline waiting for response
from the stun server provided in the args. Otherwise
the program will never exit.
Fixes#14267
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
OAuth clients that were used to generate an auth_key previously
specified the scope 'device'. 'device' is not an actual scope,
the real scope is 'devices'. The resulting OAuth token ended up
including all scopes from the specified OAuth client, so the code
was able to successfully create auth_keys.
It's better not to hardcode a scope here anyway, so that we have
the flexibility of changing which scope(s) are used in the future
without having to update old clients.
Since the qualifier never actually did anything, this commit simply
removes it.
Updates tailscale/corp#24934
Signed-off-by: Percy Wegmann <percy@tailscale.com>
If previousEtag is empty, then we assume control ACLs were not modified
manually and push the local ACLs. Instead, we defaulted to localEtag
which would be different if local ACLs were different from control.
AFAIK this was always buggy, but never reported?
Fixes#14295
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Every so often, the ProxyGroup and other controllers lose an optimistic locking race
with other controllers that update the objects they create. Stop treating
this as an error event, and instead just log an info level log line for it.
Fixes#14072
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This provides an interface for a user to force a preferred DERP outcome
for all future netchecks that will take precedence unless the forced
region is unreachable.
The option does not persist and will be lost when the daemon restarts.
Updates tailscale/corp#18997
Updates tailscale/corp#24755
Signed-off-by: James Tucker <james@tailscale.com>
cmd/containerboot,kube/kubetypes,cmd/k8s-operator: detect if Ingress is created in a tailnet that has no HTTPS
This attempts to make Kubernetes Operator L7 Ingress setup failures more explicit:
- the Ingress resource now only advertises HTTPS endpoint via status.ingress.loadBalancer.hostname when/if the proxy has succesfully loaded serve config
- the proxy attempts to catch cases where HTTPS is disabled for the tailnet and logs a warning
Updates tailscale/tailscale#12079
Updates tailscale/tailscale#10407
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
cmd/k8s-operator/deploy/chart: allow reading OAuth creds from a CSI driver's volume and annotating operator's Service account
Updates #14264
Signed-off-by: Oliver Rahner <o.rahner@dke-data.com>
When the operator enables metrics on a proxy, it uses the port 9001,
and in the near future it will start using 9002 for the debug endpoint
as well. Make sure we don't choose ports from a range that includes
9001 so that we never clash. Setting TS_SOCKS5_SERVER, TS_HEALTHCHECK_ADDR_PORT,
TS_OUTBOUND_HTTP_PROXY_LISTEN, and PORT could also open arbitrary ports,
so we will need to document that users should not choose ports from the
10000-11000 range for those settings.
Updates #13406
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
* cmd/k8s-operator,k8s-operator,go.mod: optionally create ServiceMonitor
Adds a new spec.metrics.serviceMonitor field to ProxyClass.
If that's set to true (and metrics are enabled), the operator
will create a Prometheus ServiceMonitor for each proxy to which
the ProxyClass applies.
Additionally, create a metrics Service for each proxy that has
metrics enabled.
Updates tailscale/tailscale#11292
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
We were previously relying on unintended behaviour by runc where
all containers where by default given read/write/mknod permissions
for tun devices.
This behaviour was removed in https://github.com/opencontainers/runc/pull/3468
and released in runc 1.2.
Containerd container runtime, used by Docker and majority of Kubernetes distributions
bumped runc to 1.2 in 1.7.24 https://github.com/containerd/containerd/releases/tag/v1.7.24
thus breaking our reference tun mode Tailscale Kubernetes manifests and Kubernetes
operator proxies.
This PR changes the all Kubernetes container configs that run Tailscale in tun mode
to privileged. This should not be a breaking change because all these containers would
run in a Pod that already has a privileged init container.
Updates tailscale/tailscale#14256
Updates tailscale/tailscale#10814
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
* cmd/containerboot: serve health on local endpoint
We introduced stable (user) metrics in #14035, and `TS_LOCAL_ADDR_PORT`
with it. Rather than requiring users to specify a new addr/port
combination for each new local endpoint they want the container to
serve, this combines the health check endpoint onto the local addr/port
used by metrics if `TS_ENABLE_HEALTH_CHECK` is used instead of
`TS_HEALTHCHECK_ADDR_PORT`.
`TS_LOCAL_ADDR_PORT` now defaults to binding to all interfaces on 9002
so that it works more seamlessly and with less configuration in
environments other than Kubernetes, where the operator always overrides
the default anyway. In particular, listening on localhost would not be
accessible from outside the container, and many scripted container
environments do not know the IP address of the container before it's
started. Listening on all interfaces allows users to just set one env
var (`TS_ENABLE_METRICS` or `TS_ENABLE_HEALTH_CHECK`) to get a fully
functioning local endpoint they can query from outside the container.
Updates #14035, #12898
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This commit adds a command to validate that all the metrics that
are registring in the client are also present in a path or url.
It is intended to be ran from the KB against the latest version of
tailscale.
Updates tailscale/corp#24066
Updates tailscale/corp#22075
Co-Authored-By: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Ensure that the ExternalName Service port names are always synced to the
ClusterIP Service, to fix a bug where if users created a Service with
a single unnamed port and later changed to 1+ named ports, the operator
attempted to apply an invalid multi-port Service with an unnamed port.
Also, fixes a small internal issue where not-yet Service status conditons
were lost on a spec update.
Updates tailscale/tailscale#10102
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
This updates the syspolicy.LogSCMInteractions check to run at the time of an interaction,
just before logging a message, instead of during service startup. This ensures the most
recent policy setting is used if it has changed since the service started.
Updates #12687
Signed-off-by: Nick Khyl <nickk@tailscale.com>
In this PR, we move the syspolicy.FlushDNSOnSessionUnlock check from service startup
to when a session change notification is received. This ensures that the most recent policy
setting value is used if it has changed since the service started.
We also plan to handle session change notifications for unrelated reasons
and need to decouple notification subscriptions from DNS anyway.
Updates #12687
Updates tailscale/corp#18342
Signed-off-by: Nick Khyl <nickk@tailscale.com>
containerboot:
Adds 3 new environment variables for containerboot, `TS_LOCAL_ADDR_PORT` (default
`"${POD_IP}:9002"`), `TS_METRICS_ENABLED` (default `false`), and `TS_DEBUG_ADDR_PORT`
(default `""`), to configure metrics and debug endpoints. In a follow-up PR, the
health check endpoint will be updated to use the `TS_LOCAL_ADDR_PORT` if
`TS_HEALTHCHECK_ADDR_PORT` hasn't been set.
Users previously only had access to internal debug metrics (which are unstable
and not recommended) via passing the `--debug` flag to tailscaled, but can now
set `TS_METRICS_ENABLED=true` to expose the stable metrics documented at
https://tailscale.com/kb/1482/client-metrics at `/metrics` on the addr/port
specified by `TS_LOCAL_ADDR_PORT`.
Users can also now configure a debug endpoint more directly via the
`TS_DEBUG_ADDR_PORT` environment variable. This is not recommended for production
use, but exposes an internal set of debug metrics and pprof endpoints.
operator:
The `ProxyClass` CRD's `.spec.metrics.enable` field now enables serving the
stable user metrics documented at https://tailscale.com/kb/1482/client-metrics
at `/metrics` on the same "metrics" container port that debug metrics were
previously served on. To smooth the transition for anyone relying on the way the
operator previously consumed this field, we also _temporarily_ serve tailscaled's
internal debug metrics on the same `/debug/metrics` path as before, until 1.82.0
when debug metrics will be turned off by default even if `.spec.metrics.enable`
is set. At that point, anyone who wishes to continue using the internal debug
metrics (not recommended) will need to set the new `ProxyClass` field
`.spec.statefulSet.pod.tailscaleContainer.debug.enable`.
Users who wish to opt out of the transitional behaviour, where enabling
`.spec.metrics.enable` also enables debug metrics, can set
`.spec.statefulSet.pod.tailscaleContainer.debug.enable` to false (recommended).
Separately but related, the operator will no longer specify a host port for the
"metrics" container port definition. This caused scheduling conflicts when k8s
needs to schedule more than one proxy per node, and was not necessary for allowing
the pod's port to be exposed to prometheus scrapers.
Updates #11292
---------
Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
A small follow-up to #14112- ensures that the operator itself can emit
Events for its kube state store changes.
Updates tailscale/tailscale#14080
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Otherwise we'll see a panic if we hit the dnsfallback code and try to
call NewDialer with a nil NetMon.
Updates #14161
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I81c6e72376599b341cb58c37134c2a948b97cf5f
So we can locate them in logs more easily.
Updates tailscale/corp#24721
Change-Id: Ia766c75608050dde7edc99835979a6e9bb328df2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is a follow-up to #14112 where our internal kube client was updated
to allow it to emit Events - this updates our sample kube manifests
and tsrecorder manifest templates so they can benefit from this functionality.
Updates tailscale/tailscale#14080
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Adds functionality to kube client to emit Events.
Updates kube store to emit Events when tailscaled state has been loaded, updated or if any errors where
encountered during those operations.
This should help in cases where an error related to state loading/updating caused the Pod to crash in a loop-
unlike logs of the originally failed container instance, Events associated with the Pod will still be
accessible even after N restarts.
Updates tailscale/tailscale#14080
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Limit spamming GUIs with boring updates to once in 3 seconds, unless
the notification is relatively interesting and the GUI should update
immediately.
This is basically @barnstar's #14119 but with the logic moved to be
per-watch-session (since the bit is per session), rather than
globally. And this distinguishes notable Notify messages (such as
state changes) and makes them send immediately.
Updates tailscale/corp#24553
Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We currently annotate pods with a hash of the tailscaled config so that
we can trigger pod restarts whenever it changes. However, the hash
updates more frequently than is necessary causing more restarts than is
necessary. This commit removes two causes; scaling up/down and removing
the auth key after pods have initially authed to control. However, note
that pods will still restart on scale-up/down because of the updated set
of volumes mounted into each pod. Hopefully we can fix that in a planned
follow-up PR.
Updates #13406
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This gets close to all of the remaining ones.
Updates #12912
Change-Id: I9c672bbed2654a6c5cab31e0cbece6c107d8c6fa
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Or unless the new "ts_debug_websockets" build tag is set.
Updates #1278
Change-Id: Ic4c4f81c1924250efd025b055585faec37a5491d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Otherwise all the clients only using control/controlhttp for the
ts2021 HTTP client were also pulling in WebSocket libraries, as the
server side always needs to speak websockets, but only GOOS=js clients
speak it.
This doesn't yet totally remove the websocket dependency on Linux because
Linux has a envknob opt-in to act like GOOS=js for manual testing and force
the use of WebSockets for DERP only (not control). We can put that behind
a build tag in a future change to eliminate the dep on all GOOSes.
Updates #1278
Change-Id: I4f60508f4cad52bf8c8943c8851ecee506b7ebc9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Some environments would like to remove Tailscale SSH support for the
binary for various reasons when not needed (either for peace of mind,
or the ~1MB of binary space savings).
Updates tailscale/corp#24454
Updates #1278
Updates #12614
Change-Id: Iadd6c5a393992c254b5dc9aa9a526916f96fd07a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Sets a custom hostinfo app type for ProxyGroup replicas, similarly
to how we do it for all other Kubernetes Operator managed components.
Updates tailscale/tailscale#13406,tailscale/corp#22920
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
- Basic description of DERP
If configured to do so, also show
- Mailto link to security@tailscale.com
- Link to Tailscale Security Policies
- Link to Tailscale Acceptable Use Policy
Updates tailscale/corp#24092
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This adds a new generic result type (motivated by golang/go#70084) to
try it out, and uses it in the new lineutil package (replacing the old
lineread package), changing that package to return iterators:
sometimes over []byte (when the input is all in memory), but sometimes
iterators over results of []byte, if errors might happen at runtime.
Updates #12912
Updates golang/go#70084
Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Thanks to @davidbuzz for raising the issue in #13973.
Fixes#8272Fixes#13973
Change-Id: Ic413e14d34c82df3c70a97e591b90316b0b4946b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A filesystem was plumbed into netstack in 993acf4475
but hasn't been used since 2d5d6f5403. Remove it.
Noticed while rebasing a Tailscale fork elsewhere.
Updates tailscale/corp#16827
Change-Id: Ib76deeda205ffe912b77a59b9d22853ebff42813
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In this PR, we add the tailscale syspolicy command with two subcommands: list, which displays
policy settings, and reload, which forces a reload of those settings. We also update the LocalAPI
and LocalClient to facilitate these additions.
Updates #12687
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Now when we have HA for egress proxies, it makes sense to support topology
spread constraints that would allow users to define more complex
topologies of how proxy Pods need to be deployed in relation with other
Pods/across regions etc.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
- `tailscale metrics print`: to show metric values in console
- `tailscale metrics write`: to write metrics to a file (with a tempfile
& rename dance, which is atomic on Unix).
Also, remove the `TS_DEBUG_USER_METRICS` envknob as we are getting
more confident in these metrics.
Updates tailscale/corp#22075
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
In this PR, we update the syspolicy package to utilize syspolicy/rsop under the hood,
and remove syspolicy.CachingHandler, syspolicy.windowsHandler and related code
which is no longer used.
We mark the syspolicy.Handler interface and RegisterHandler/SetHandlerForTest functions
as deprecated, but keep them temporarily until they are no longer used in other repos.
We also update the package to register setting definitions for all existing policy settings
and to register the Registry-based, Windows-specific policy stores when running on Windows.
Finally, we update existing internal and external tests to use the new API and add a few more
tests and benchmarks.
Updates #12687
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This allows us to print the time that a netcheck was run, which is
useful in debugging.
Updates #10972
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Id48d30d4eb6d5208efb2b1526a71d83fe7f9320b
It had bit-rotted likely during the transition to vector io in
76389d8baf. Tested on Ubuntu 24.04
by creating a netns and doing the DHCP dance to get an IP.
Updates #2589
Signed-off-by: Maisem Ali <maisem@tailscale.com>
In f77821fd63 (released in v1.72.0), we made the client tell a DERP server
when the connection was not its ideal choice (the first node in its region).
But we didn't do anything with that information until now. This adds a
metric about how many such connections are on a given derper, and also
adds a bit to the PeerPresentFlags bitmask so watchers can identify
(and rebalance) them.
Updates tailscale/corp#372
Change-Id: Ief8af448750aa6d598e5939a57c062f4e55962be
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates tailscale/tailscale#13839
Adds a new blockblame package which can detect common MITM SSL certificates used by network appliances. We use this in `tlsdial` to display a dedicated health warning when we cannot connect to control, and a network appliance MITM attack is detected.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
Adds logic to `checkExitNodePrefsLocked` to return an error when
attempting to use exit nodes on a platform where this is not supported.
This mirrors logic that was added to error out when trying to use `ssh`
on an unsupported platform, and has very similar semantics.
Fixes https://github.com/tailscale/tailscale/issues/13724
Signed-off-by: Mario Minardi <mario@tailscale.com>
* updates to LocalBackend require metrics to be passed in which are now initialized
* os.MkdirTemp isn't supported in wasm/js so we simply return empty
string for logger
* adds a UDP dialer which was missing and led to the dialer being
incompletely initialized
Fixes#10454 and #8272
Signed-off-by: Christian <christian@devzero.io>
For a customer that wants to run their own DERP prober, let's add a
/healthz endpoint that can be used to monitor derpprobe itself.
Updates #6526
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Iba315c999fc0b1a93d8c503c07cc733b4c8d5b6b
This helps better distinguish what is generating activity to the
Tailscale public API.
Updates tailscale/corp#23838
Signed-off-by: Percy Wegmann <percy@tailscale.com>
We were using google/uuid in two places and that brought in database/sql/driver.
We didn't need it in either place.
Updates #13760
Updates tailscale/corp#20099
Change-Id: Ieed32f1bebe35d35f47ec5a2a429268f24f11f1f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
cmd/k8s-operator,k8s-operator/apis: set a readiness condition on egress Services
Set a readiness condition on ExternalName Services that define a tailnet target
to route cluster traffic to via a ProxyGroup's proxies. The condition
is set to true if at least one proxy is currently set up to route.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
We don't need to error out and continuously reconcile if ProxyClass
has not (yet) been created, once it gets created the ProxyGroup
reconciler will get triggered.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Ensure that .status.podIPs is used to select Pod's IP
in all reconcilers.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
As discussed in #13684, base the ProxyGroup's proxy definitions on the same
scaffolding as the existing proxies, as defined in proxy.yaml
Updates #13406
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Currently egress Services for ProxyGroup only work for Pods and Services
with IPv4 addresses. Ensure that it works on dual stack clusters by reading
proxy Pod's IP from the .status.podIPs list that always contains both
IPv4 and IPv6 address (if the Pod has them) rather than .status.podIP that
could contain IPv6 only for a dual stack cluster.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The default ProxyClass can be set via helm chart or env var, and applies
to all proxies that do not otherwise have an explicit ProxyClass set.
This ensures proxies created by the new ProxyGroup CRD are consistent
with the behaviour of existing proxies
Nearby but unrelated changes:
* Fix up double error logs (controller runtime logs returned errors)
* Fix a couple of variable names
Updates #13406
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Rearrange conditionals to reduce indentation and make it a bit easier to read
the logic. Also makes some error message updates for better consistency
with the recent decision around capitalising resource names and the
upcoming addition of config secrets.
Updates #cleanup
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Implements the controller for the new ProxyGroup CRD, designed for
running proxies in a high availability configuration. Each proxy gets
its own config and state Secret, and its own tailscale node ID.
We are currently mounting all of the config secrets into the container,
but will stop mounting them and instead read them directly from the kube
API once #13578 is implemented.
Updates #13406
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Adds a new reconciler that reconciles ExternalName Services that define a
tailnet target that should be exposed to cluster workloads on a ProxyGroup's
proxies.
The reconciler ensures that for each such service, the config mounted to
the proxies is updated with the tailnet target definition and that
and EndpointSlice and ClusterIP Service are created for the service.
Adds a new reconciler that ensures that as proxy Pods become ready to route
traffic to a tailnet target, the EndpointSlice for the target is updated
with the Pods' endpoints.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The AddSNATRuleForDst rule was adding a new rule each time it was called including:
- if a rule already existed
- if a rule matching the destination, but with different desired source already existed
This was causing issues especially for the in-progress egress HA proxies work,
where the rules are now refreshed more frequently, so more redundant rules
were being created.
This change:
- only creates the rule if it doesn't already exist
- if a rule for the same dst, but different source is found, delete it
- also ensures that egress proxies refresh firewall rules
if the node's tailnet IP changes
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
In prep for a future port 80 MITM fix, make the 'debug ts2021' command
retry once after a failure to give it a chance to pick a new strategy.
Updates #13597
Change-Id: Icb7bad60cbf0dbec78097df4a00e9795757bc8e4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* cmd/containerboot,kube,util/linuxfw: configure kube egress proxies to route to 1+ tailnet targets
This commit is first part of the work to allow running multiple
replicas of the Kubernetes operator egress proxies per tailnet service +
to allow exposing multiple tailnet services via each proxy replica.
This expands the existing iptables/nftables-based proxy configuration
mechanism.
A proxy can now be configured to route to one or more tailnet targets
via a (mounted) config file that, for each tailnet target, specifies:
- the target's tailnet IP or FQDN
- mappings of container ports to which cluster workloads will send traffic to
tailnet target ports where the traffic should be forwarded.
Example configfile contents:
{
"some-svc": {"tailnetTarget":{"fqdn":"foo.tailnetxyz.ts.net","ports"{"tcp:4006:80":{"protocol":"tcp","matchPort":4006,"targetPort":80},"tcp:4007:443":{"protocol":"tcp","matchPort":4007,"targetPort":443}}}}
}
A proxy that is configured with this config file will configure firewall rules
to route cluster traffic to the tailnet targets. It will then watch the config file
for updates as well as monitor relevant netmap updates and reconfigure firewall
as needed.
This adds a bunch of new iptables/nftables functionality to make it easier to dynamically update
the firewall rules without needing to restart the proxy Pod as well as to make
it easier to debug/understand the rules:
- for iptables, each portmapping is a DNAT rule with a comment pointing
at the 'service',i.e:
-A PREROUTING ! -i tailscale0 -p tcp -m tcp --dport 4006 -m comment --comment "some-svc:tcp:4006 -> tcp:80" -j DNAT --to-destination 100.64.1.18:80
Additionally there is a SNAT rule for each tailnet target, to mask the source address.
- for nftables, a separate prerouting chain is created for each tailnet target
and all the portmapping rules are placed in that chain. This makes it easier
to look up rules and delete services when no longer needed.
(nftables allows hooking a custom chain to a prerouting hook, so no extra work
is needed to ensure that the rules in the service chains are evaluated).
The next steps will be to get the Kubernetes Operator to generate
the configfile and ensure it is mounted to the relevant proxy nodes.
Updates tailscale/tailscale#13406
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The operator creates a non-reusable auth key for each of
the cluster proxies that it creates and puts in the tailscaled
configfile mounted to the proxies.
The proxies are always tagged, and their state is persisted
in a Kubernetes Secret, so their node keys are expected to never
be regenerated, so that they don't need to re-auth.
Some tailnet configurations however have seen issues where the auth
keys being left in the tailscaled configfile cause the proxies
to end up in unauthorized state after a restart at a later point
in time.
Currently, we have not found a way to reproduce this issue,
however this commit removes the auth key from the config once
the proxy can be assumed to have logged in.
If an existing, logged-in proxy is upgraded to this version,
its redundant auth key will be removed from the conffile.
If an existing, logged-in proxy is downgraded from this version
to a previous version, it will work as before without re-issuing key
as the previous code did not enforce that a key must be present.
Updates tailscale/tailscale#13451
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The ProxyGroup CRD specifies a set of N pods which will each be a
tailnet device, and will have M different ingress or egress services
mapped onto them. It is the mechanism for specifying how highly
available proxies need to be. This commit only adds the definition, no
controller loop, and so it is not currently functional.
This commit also splits out TailnetDevice and RecorderTailnetDevice
into separate structs because the URL field is specific to recorders,
but we want a more generic struct for use in the ProxyGroup status field.
Updates #13406
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
this commit changes usermetrics to be non-global, this is a building
block for correct metrics if a go process runs multiple tsnets or
in tests.
Updates #13420
Updates tailscale/corp#22075
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Updates tailscale/tailscale#13326
Adds a CLI subcommand to perform DNS queries using the internal DNS forwarder and observe its internals (namely, which upstream resolvers are being used).
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
containerboot's main.go had grown to well over 1000 lines with
lots of disparate bits of functionality. This commit is pure copy-
paste to group related functionality outside of the main function
into its own set of files. Everything is still in the main package
to keep the diff incremental and reviewable.
Updates #cleanup
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
cmd/k8s-operator,k8s-operator,kube: Add TSRecorder CRD + controller
Deploys tsrecorder images to the operator's cluster. S3 storage is
configured via environment variables from a k8s Secret. Currently
only supports a single tsrecorder replica, but I've tried to take early
steps towards supporting multiple replicas by e.g. having a separate
secret for auth and state storage.
Example CR:
```yaml
apiVersion: tailscale.com/v1alpha1
kind: Recorder
metadata:
name: rec
spec:
enableUI: true
```
Updates #13298
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Rename kube/{types,client,api} -> kube/{kubetypes,kubeclient,kubeapi}
so that we don't need to rename the package on each import to
convey that it's kubernetes specific.
Updates#cleanup
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Further split kube package into kube/{client,api,types}. This is so that
consumers who only need constants/static types don't have to import
the client and api bits.
Updates#cleanup
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
When tailscaled restarts and our watch connection goes down, we get
stuck in an infinite loop printing `ipnbus error: EOF` (which ended up
consuming all the disk space on my laptop via the log file). Instead,
handle errors in `watchIPNBus` and reconnect after a short delay.
Updates #1708
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Updates tailscale/tailscale#13326
This PR begins implementing a `tailscale dns` command group in the Tailscale CLI. It provides an initial implementation of `tailscale dns status` which dumps the state of the internal DNS forwarder.
Two new endpoints were added in LocalAPI to support the CLI functionality:
- `/netmap`: dumps a copy of the last received network map (because the CLI shouldn't have to listen to the ipn bus for a copy)
- `/dns-osconfig`: dumps the OS DNS configuration (this will be very handy for the UI clients as well, as they currently do not display this information)
My plan is to implement other subcommands mentioned in tailscale/tailscale#13326, such as `query`, in later PRs.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
We've added more probe targets recently which has resulted in more
timeouts behind restrictive NATs in localized testing that don't
like how many flows we are creating at once. Not so much an issue
for datacenter or cloud-hosted deployments.
Updates tailscale/corp#22114
Signed-off-by: Jordan Whited <jordan@tailscale.com>
* cmd/k8s-operator,k8s-operator/sessonrecording: ensure CastHeader contains terminal size
For tsrecorder to be able to play session recordings, the recording's
CastHeader must have '.Width' and '.Height' fields set to non-zero.
Kubectl (or whoever is the client that initiates the 'kubectl exec'
session recording) sends the terminal dimensions in a resize message that
the API server proxy can intercept, however that races with the first server
message that we need to record.
This PR ensures we wait for the terminal dimensions to be processed from
the first resize message before any other data is sent, so that for all
sessions with terminal attached, the header of the session recording
contains the terminal dimensions and the recording can be played by tsrecorder.
Updates tailscale/tailscale#19821
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Previously, despite what the commit said, we were using a raw IP socket
that was *not* an AF_PACKET socket, and thus was subject to the host
firewall rules. Switch to using a real AF_PACKET socket to actually get
the functionality we want.
Updates #13140
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If657daeeda9ab8d967e75a4f049c66e2bca54b78
Currently, we use PermitRead/PermitWrite/PermitCert permission flags to determine which operations are allowed for a LocalAPI client.
These checks are performed when localapi.Handler handles a request. Additionally, certain operations (e.g., changing the serve config)
requires the connected user to be a local admin. This approach is inherently racey and is subject to TOCTOU issues.
We consider it to be more critical on Windows environments, which are inherently multi-user, and therefore we prevent more than one
OS user from connecting and utilizing the LocalBackend at the same time. However, the same type of issues is also applicable to other
platforms when switching between profiles that have different OperatorUser values in ipn.Prefs.
We'd like to allow more than one Windows user to connect, but limit what they can see and do based on their access rights on the device
(e.g., an local admin or not) and to the currently active LoginProfile (e.g., owner/operator or not), while preventing TOCTOU issues on Windows
and other platforms. Therefore, we'd like to pass an actor from the LocalAPI to the LocalBackend to represent the user performing the operation.
The LocalBackend, or the profileManager down the line, will then check the actor's access rights to perform a given operation on the device
and against the current (and/or the target) profile.
This PR does not change the current permission model in any way, but it introduces the concept of an actor and includes some preparatory
work to pass it around. Temporarily, the ipnauth.Actor interface has methods like IsLocalSystem and IsLocalAdmin, which are only relevant
to the current permission model. It also lacks methods that will actually be used in the new model. We'll be adding these gradually in the next
PRs and removing the deprecated methods and the Permit* flags at the end of the transition.
Updates tailscale/corp#18342
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This commit adds a new usermetric package and wires
up metrics across the tailscale client.
Updates tailscale/corp#22075
Co-authored-by: Anton Tolchanov <anton@tailscale.com>
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
There were a few places it could get wedged (notably the dial without
a timeout).
And add a knob for verbose debug logs.
And keep two idle connections always.
Updates #13038
Change-Id: I952ad182d7111481d97a83c12aa2ff4bfdc55fe8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We have several checked type assertions to *types.Named in both cmd/cloner and cmd/viewer.
As Go 1.23 updates the go/types package to produce Alias type nodes for type aliases,
these type assertions no longer work as expected unless the new behavior is disabled
with gotypesalias=0.
In this PR, we add codegen.NamedTypeOf(t types.Type), which functions like t.(*types.Named)
but also unrolls type aliases. We then use it in place of type assertions in the cmd/cloner and
cmd/viewer packages where appropriate.
We also update type switches to include *types.Alias alongside *types.Named in relevant cases,
remove *types.Struct cases when switching on types.Type.Underlying and update the tests
with more cases where type aliases can be used.
Updates #13224
Updates #12912
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Go 1.23 updates the go/types package to produce Alias type nodes for type aliases, unless disabled with gotypesalias=0.
This new default behavior breaks codegen.LookupMethod, which uses checked type assertions to types.Named and
types.Interface, as only named types and interfaces have methods.
In this PR, we update codegen.LookupMethod to perform method lookup on the right-hand side of the alias declaration
and clearly switch on the supported type nodes types. We also improve support for various edge cases, such as when an alias
is used as a type parameter constraint, and add tests for the LookupMethod function.
Additionally, we update cmd/viewer/tests to include types with aliases used in type fields and generic type constraints.
Updates #13224
Updates #12912
Signed-off-by: Nick Khyl <nickk@tailscale.com>
The natlab Test Agent (tta) still had its old log streaming hack in
place where it dialed out to anything on TCP port 124 and those logs
were streamed to the host running the tests. But we'd since added gokrazy
syslog streaming support, which made that redundant.
So remove all the port 124 stuff. And then make sure we log to stderr
so gokrazy logs it to syslog.
Also, keep the first 1MB of logs in memory in tta too, exported via
localhost:8034/logs for interactive debugging. That was very useful
during debugging when I added IPv6 support. (which is coming in future
PRs)
Updates #13038
Change-Id: Ieed904a704410b9031d5fd5f014a73412348fa7f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Otherwise you get "Access denied: watch IPN bus access denied, must
set ipn.NotifyNoPrivateKeys when not running as admin/root or
operator".
This lets a non-operator at least start the app and see the status, even
if they can't change everything. (the web UI is unaffected by operator)
A future change can add a LocalAPI call to check permissions and guide
people through adding a user as an operator (perhaps the web client
can do that?)
Updates #1708
Change-Id: I699e035a251b4ebe14385102d5e7a2993424c4b7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds a systray app for linux, similar to the apps for macOS and
windows. There are already a number of community-developed systray apps,
but most of them are either long abandoned, are built for a specific
desktop environment, or simply wrap the tailscale CLI.
This uses fyne.io/systray (a fork of github.com/getlantern/systray)
which uses newer D-Bus specifications to render the tray icon and menu.
This results in a pretty broad support for modern desktop environments.
This initial commit lacks a number of features like profile switching,
device listing, and exit node selection. This is really focused on the
application structure, the interaction with LocalAPI, and some system
integration pieces like the app icon, notifications, and the clipboard.
Updates #1708
Signed-off-by: Will Norris <will@tailscale.com>
After the upstream PR is merged, we can point directly at github.com/vishvananda/netlink
and retire github.com/tailscale/netlink.
See https://github.com/vishvananda/netlink/pull/1006
Updates #12298
Signed-off-by: Percy Wegmann <percy@tailscale.com>
In prep for updating to new staticcheck required for Go 1.23.
Updates #12912
Change-Id: If77892a023b79c6fa798f936fc80428fd4ce0673
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To avoid dig vs nslookup vs $X availability issues between
OSes/distros. And to be in Go, to match the resolver we use.
Updates #13038
Change-Id: Ib7e5c351ed36b5470a42cbc230b8f27eed9a1bf8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In Tailnet Lock, there is an implicit limit on the number of rotation
signatures that can be chained before the signature becomes too long.
This program helps tailnet admins to identify nodes that have signatures
with long chains and prints commands to re-sign those node keys with a
fresh direct signature. It's a temporary mitigation measure, and we will
remove this tool as we design and implement a long-term approach for
rotation signatures.
Example output:
```
2024/08/20 18:25:03 Self: does not need re-signing
2024/08/20 18:25:03 Visible peers with valid signatures:
2024/08/20 18:25:03 Peer xxx2.yy.ts.net. (100.77.192.34) nodeid=nyDmhiZiGA11KTM59, current signature kind=direct: does not need re-signing
2024/08/20 18:25:03 Peer xxx3.yy.ts.net. (100.84.248.22) nodeid=ndQ64mDnaB11KTM59, current signature kind=direct: does not need re-signing
2024/08/20 18:25:03 Peer xxx4.yy.ts.net. (100.85.253.53) nodeid=nmZfVygzkB21KTM59, current signature kind=rotation: chain length 4, printing command to re-sign
tailscale lock sign nodekey:530bddbfbe69e91fe15758a1d6ead5337aa6307e55ac92dafad3794f8b3fc661 tlpub:4bf07597336703395f2149dce88e7c50dd8694ab5bbde3d7c2a1c7b3e231a3c2
```
To support this, the NetworkLockStatus localapi response now includes
information about signatures of all peers rather than just the invalid
ones. This is not displayed by default in `tailscale lock status`, but
will be surfaced in `tailscale lock status --json`.
Updates #13185
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
In 2f27319baf we disabled GRO due to a
data race around concurrent calls to tstun.Wrapper.Write(). This commit
refactors GRO to be thread-safe, and re-enables it on Linux.
This refactor now carries a GRO type across tstun and netstack APIs
with a lifetime that is scoped to a single tstun.Wrapper.Write() call.
In 25f0a3fc8f we used build tags to
prevent importation of gVisor's GRO package on iOS as at the time we
believed it was contributing to additional memory usage on that
platform. It wasn't, so this commit simplifies and removes those
build tags.
Updates tailscale/corp#22353
Updates tailscale/corp#22125
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
cmd/k8s-operator/deploy: replace wildcards in Kubernetes Operator RBAC role definitions with verbs
fixes: #13168
Signed-off-by: Pierig Le Saux <pierig@n3xt.io>
Coder has just adopted nhooyr/websocket which unfortunately changes the import path.
`github.com/coder/coder` imports `tailscale.com/net/wsconn` which was still pointing
to `nhooyr.io/websocket`, but this change updates it.
See https://coder.com/blog/websocket
Updates #13154
Change-Id: I3dec6512472b14eae337ae22c5bcc1e3758888d5
Signed-off-by: Kyle Carberry <kyle@carberry.com>
This PR modifies viewTypeForContainerType to use the last type parameter of a container type
as the value type, enabling the implementation of map-like container types where the second-to-last
(usually first) type parameter serves as the key type.
It also adds a MapContainer type to test the code generation.
Updates #12736
Signed-off-by: Nick Khyl <nickk@tailscale.com>
cmd/k8s-operator,k8s-operator/sessionrecording: support recording WebSocket sessions
Kubernetes currently supports two streaming protocols, SPDY and WebSockets.
WebSockets are replacing SPDY, see
https://github.com/kubernetes/enhancements/issues/4006.
We were currently only supporting SPDY, erroring out if session
was not SPDY and relying on the kube's built-in SPDY fallback.
This PR:
- adds support for parsing contents of 'kubectl exec' sessions streamed
over WebSockets
- adds logic to distinguish 'kubectl exec' requests for a SPDY/WebSockets
sessions and call the relevant handler
Updates tailscale/corp#19821
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
Add functionality to optionally serve a health check endpoint
(off by default).
Users can enable health check endpoint by setting
TS_HEALTHCHECK_ADDR_PORT to [<addr>]:<port>.
Containerboot will then serve an unauthenticatd HTTP health check at
/healthz at that address. The health check returns 200 OK if the
node has at least one tailnet IP address, else returns 503.
Updates tailscale/tailscale#12898
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
... rather than abusing the generic tsapp.
Per discussion in https://github.com/gokrazy/gokrazy/pull/275
It also means we can remove stuff we don't need, like ntp or randomd.
Updates #13038
Change-Id: Iccf579c354bd3b5025d05fa1128e32f1d5bde4e4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The change in b7e48058c8 was too loose; it also captured the CLI
being run as a child process under cmd/tta.
Updates #13038
Updates #1866
Change-Id: Id410b87132938dd38ed4dd3959473c5d0d242ff5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* cmd/k8s-operator: fix DNS reconciler for dual-stack clusters
This fixes a bug where DNS reconciler logic was always assuming
that no more than one EndpointSlice exists for a Service.
In fact, there can be multiple, for example, in dual-stack
clusters, but also in other cases this is valid (as per kube docs).
This PR:
- allows for multiple EndpointSlices
- picks out the ones for IPv4 family
- deduplicates addresses
Updates tailscale/tailscale#13056
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
Package setting contains types for defining and representing policy settings.
It facilitates the registration of setting definitions using Register and RegisterDefinition,
and the retrieval of registered setting definitions via Definitions and DefinitionOf.
This package is intended for use primarily within the syspolicy package hierarchy,
and added in a preparation for the next PRs.
Updates #12687
Signed-off-by: Nick Khyl <nickk@tailscale.com>
In particular, tests showing that #3824 works. But that test doesn't
actually work yet; it only gets a DERP connection. (why?)
Updates #13038
Change-Id: Ie1fd1b6a38d4e90fae7e72a0b9a142a95f0b2e8f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
getConns() is now responsible for returning both stable and unstable
conns. conn and measureFn are now passed together via connAndMeasureFn.
newConnAndMeasureFn() is responsible for constructing them.
TCP measurement timeouts are adjusted to more closely match netcheck.
Updates tailscale/corp#22114
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This adds a new NodeAgentClient type that can be used to
invoke the LocalAPI using the LocalClient instead of
handcrafted URLs. However, there are certain cases where
it does make sense for the node agent to provide more
functionality than whats possible with just the LocalClient,
as such it also exposes a http.Client to make requests directly.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
And don't make guests under vnet/natlab upload to logcatcher,
as there won't be a valid cert anyway.
Updates #13038
Change-Id: Ie1ce0139788036b8ecc1804549a9b5d326c5fef5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
'stun' has been removed from metric names and replaced with a protocol
label. This refactor is preparation work for HTTPS & ICMP support.
Updates tailscale/corp#22114
Signed-off-by: Jordan Whited <jordan@tailscale.com>
In a situation when manual edits are made on the admin panel, around the
GitOps process, the pusher will be stuck if `--fail-on-manual-edits` is
set, as expected.
To recover from this, there are 2 options:
1. revert the admin panel changes to get back in sync with the code
2. check in the manual edits to code
The former will work well, since previous and local ETags will match
control ETag again. The latter will still fail, since local and control
ETags match, but previous does not.
For this situation, check the local ETag against control first and
ignore previous when things are already in sync.
Updates https://github.com/tailscale/corp/issues/22177
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
For cases where users want to be extra careful about not overwriting
manual changes, add a flag to hard-fail. This is only useful if the etag
cache is persistent or otherwise reliable. This flag should not be used
in ephemeral CI workers that won't persist the cache.
Updates https://github.com/tailscale/corp/issues/22177
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
* cmd/tsidp: add funnel support
Updates #10263.
Signed-off-by: Naman Sood <mail@nsood.in>
* look past funnel-ingress-node to see who we're authenticating
Signed-off-by: Naman Sood <mail@nsood.in>
* fix comment typo
Signed-off-by: Naman Sood <mail@nsood.in>
* address review feedback, support Basic auth for /token
Turns out you need to support Basic auth if you do client ID/secret
according to OAuth.
Signed-off-by: Naman Sood <mail@nsood.in>
* fix typos
Signed-off-by: Naman Sood <mail@nsood.in>
* review fixes
Signed-off-by: Naman Sood <mail@nsood.in>
* remove debugging log
Signed-off-by: Naman Sood <mail@nsood.in>
* add comments, fix header
Signed-off-by: Naman Sood <mail@nsood.in>
---------
Signed-off-by: Naman Sood <mail@nsood.in>
During review of #8644 the `recover-compromised-key` command was renamed
to `revoke-key`, but the old name remained in some messages printed by
the command.
Fixestailscale/corp#19446
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
This commit implements TCP GRO for packets being written to gVisor on
Linux. Windows support will follow later. The wireguard-go dependency is
updated in order to make use of newly exported IP checksum functions.
gVisor is updated in order to make use of newly exported
stack.PacketBuffer GRO logic.
TCP throughput towards gVisor, i.e. TUN write direction, is dramatically
improved as a result of this commit. Benchmarks show substantial
improvement, sometimes as high as 2x. High bandwidth-delay product
paths remain receive window limited, bottlenecked by gVisor's default
TCP receive socket buffer size. This will be addressed in a follow-on
commit.
The iperf3 results below demonstrate the effect of this commit between
two Linux computers with i5-12400 CPUs. There is roughly ~13us of round
trip latency between them.
The first result is from commit 57856fc without TCP GRO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 4.77 GBytes 4.10 Gbits/sec 20 sender
[ 5] 0.00-10.00 sec 4.77 GBytes 4.10 Gbits/sec receiver
The second result is from this commit with TCP GRO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.6 GBytes 9.14 Gbits/sec 20 sender
[ 5] 0.00-10.00 sec 10.6 GBytes 9.14 Gbits/sec receiver
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit implements TCP GSO for packets being read from gVisor on
Linux. Windows support will follow later. The wireguard-go dependency is
updated in order to make use of newly exported GSO logic from its tun
package.
A new gVisor stack.LinkEndpoint implementation has been established
(linkEndpoint) that is loosely modeled after its predecessor
(channel.Endpoint). This new implementation supports GSO of monster TCP
segments up to 64K in size, whereas channel.Endpoint only supports up to
32K. linkEndpoint will also be required for GRO, which will be
implemented in a follow-on commit.
TCP throughput from gVisor, i.e. TUN read direction, is dramatically
improved as a result of this commit. Benchmarks show substantial
improvement through a wide range of RTT and loss conditions, sometimes
as high as 5x.
The iperf3 results below demonstrate the effect of this commit between
two Linux computers with i5-12400 CPUs. There is roughly ~13us of round
trip latency between them.
The first result is from commit 57856fc without TCP GSO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.51 GBytes 2.15 Gbits/sec 154 sender
[ 5] 0.00-10.00 sec 2.49 GBytes 2.14 Gbits/sec receiver
The second result is from this commit with TCP GSO.
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 12.6 GBytes 10.8 Gbits/sec 6 sender
[ 5] 0.00-10.00 sec 12.6 GBytes 10.8 Gbits/sec receiver
Updates #6816
Signed-off-by: Jordan Whited <jordan@tailscale.com>
fixes tailscale#12968
The dns manager cleanup func was getting passed a nil
health tracker, which will panic. Fixed to pass it
the system health tracker.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
cmd/k8s-operator,k8s-operator/sessionrecording,sessionrecording,ssh/tailssh: refactor session recording functionality
Refactor SSH session recording functionality (mostly the bits related to
Kubernetes API server proxy 'kubectl exec' session recording):
- move the session recording bits used by both Tailscale SSH
and the Kubernetes API server proxy into a shared sessionrecording package,
to avoid having the operator to import ssh/tailssh
- move the Kubernetes API server proxy session recording functionality
into a k8s-operator/sessionrecording package, add some abstractions
in preparation for adding support for a second streaming protocol (WebSockets)
Updates tailscale/corp#19821
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Re-instates the functionality that generates CRD API docs, but using
a different library as the one we were using earlier seemed to have
some issues with its Git history.
Also regenerates the docs (make kube-generate-all).
Updates tailscale/tailscale#12859
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Updates tailscale/tailscale#1634
This PR introduces a new `captive-portal-detected` Warnable which is set to an unhealthy state whenever a captive portal is detected on the local network, preventing Tailscale from connecting.
ipn/ipnlocal: fix captive portal loop shutdown
Change-Id: I7cafdbce68463a16260091bcec1741501a070c95
net/captivedetection: fix mutex misuse
ipn/ipnlocal: ensure that we don't fail to start the timer
Change-Id: I3e43fb19264d793e8707c5031c0898e48e3e7465
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
This adds support for container-like types such as Container[T] that
don't explicitly specify a view type for T. Instead, a package implementing
a container type should also implement and export a ContainerView[T, V] type
and a ContainerViewOf(*Container[T]) ContainerView[T, V] function, which
returns a view for the specified container, inferring the element view type V
from the element type T.
Updates #12736
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Some users run "tailscale cert" in a cron job to renew their
certificates on disk. The time until the next cron job run may be long
enough for the old cert to expire with our default heristics.
Add a `--min-validity` flag which ensures that the returned cert is
valid for at least the provided duration (unless it's longer than the
cert lifetime set by Let's Encrypt).
Updates #8725
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Remove fybrik.io/crdoc dependency as it is causing issues for folks attempting
to vendor tailscale using GOPROXY=direct.
This means that the CRD API docs in ./k8s-operator/api.md will no longer
be generated- I am going to look at replacing it with another tool
in a follow-up.
Updates tailscale/tailscale#12859
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
To match the format of exit node suggestions and ensure that the result
is not ambiguous, relax exit node CLI selection to permit using a FQDN
including the trailing dot.
Updates #12618
Change-Id: I04b9b36d2743154aa42f2789149b2733f8555d3f
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
This adds support for generic types and interfaces to our cloner and viewer codegens.
It updates these packages to determine whether to make shallow or deep copies based
on the type parameter constraints. Additionally, if a template parameter or an interface
type has View() and Clone() methods, we'll use them for getters and the cloner of the
owning structure.
Updates #12736
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This adds a package with GP-related functions and types to be used in the future PRs.
It also updates nrptRuleDatabase to use the new package instead of its own gpNotificationWatcher implementation.
Updates #12687
Signed-off-by: Nick Khyl <nickk@tailscale.com>
While `clientupdate.Updater` won't be able to apply updates on macsys,
we use `clientupdate.CanAutoUpdate` to gate the EditPrefs endpoint in
localAPI. We should allow the GUI client to set AutoUpdate.Apply on
macsys for it to properly get reported to the control plane. This also
allows the tailnet-wide default for auto-updates to propagate to macsys
clients.
Updates https://github.com/tailscale/corp/issues/21339
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
cmd/k8s-operator,ssh/tailssh,tsnet: optionally record kubectl exec sessions
The Kubernetes operator's API server proxy, when it receives a request
for 'kubectl exec' session now reads 'RecorderAddrs', 'EnforceRecorder'
fields from tailcfg.KubernetesCapRule.
If 'RecorderAddrs' is set to one or more addresses (of a tsrecorder instance(s)),
it attempts to connect to those and sends the session contents
to the recorder before forwarding the request to the kube API
server. If connection cannot be established or fails midway,
it is only allowed if 'EnforceRecorder' is not true (fail open).
Updates tailscale/corp#19821
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Co-authored-by: Maisem Ali <maisem@tailscale.com>
cmd/containerboot,cmd/k8s-operator: enable IPv6 for fqdn egress proxies
Don't skip installing egress forwarding rules for IPv6 (as long as the host
supports IPv6), and set headless services `ipFamilyPolicy` to
`PreferDualStack` to optionally enable both IP families when possible. Note
that even with `PreferDualStack` set, testing a dual-stack GKE cluster with
the default DNS setup of kube-dns did not correctly set both A and
AAAA records for the headless service, and instead only did so when
switching the cluster DNS to Cloud DNS. For both IPv4 and IPv6 to work
simultaneously in a dual-stack cluster, we require headless services to
return both A and AAAA records.
If the host doesn't support IPv6 but the FQDN specified only has IPv6
addresses available, containerboot will exit with error code 1 and an
error message because there is no viable egress route.
Fixes#12215
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This change expands the `exit-node list -filter` command to display all
location based exit nodes for the filtered country. This allows users
to switch to alternative servers when our recommended exit node is not
working as intended.
This change also makes the country filter matching case insensitive,
e.g. both USA and usa will work.
Updates #12698
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
In hopes it'll be found more.
Updates tailscale/corp#20844
Change-Id: Ic92ee9908f45b88f8770de285f838333f9467465
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A few other minor language updates.
Updates tailscale/corp#20844
Change-Id: Idba85941baa0e2714688cc8a4ec3e242e7d1a362
Signed-off-by: James Tucker <james@tailscale.com>
The exit node suggestion CLI command was written with the assumption
that it's possible to provide a stableid on the command line, but this
is incorrect. Instead, it will now emit the name of the exit node.
Fixes#12618
Change-Id: Id7277f395b5fca090a99b0d13bfee7b215bc9802
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
stunstamp now sends data to Prometheus via remote write, and Prometheus
can serve the same data. Retaining and cleaning up old data in sqlite
leads to long probing pauses, and it's not worth investing more effort
to optimize the schema and/or concurrency model.
Updates tailscale/corp#20344
Signed-off-by: Jordan Whited <jordan@tailscale.com>
PeerPresentFlags was added in 5ffb2668ef but wasn't plumbed through to
the RunConnectionLoop. Rather than add yet another parameter (as
IP:port was added earlier), pass in the raw PeerPresentMessage and
PeerGoneMessage struct values, which are the same things, plus two
fields: PeerGoneReasonType for gone and the PeerPresentFlags from
5ffb2668ef.
Updates tailscale/corp#17816
Change-Id: Ib19d9f95353651ada90656071fc3656cf58b7987
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds the ability to "peek" at the value of a SyncValue, so that
it's possible to observe a value without computing this.
Updates tailscale/corp#17122
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I06f88c22a1f7ffcbc7ff82946335356bb0ef4622
This is implemented via GetBestInterfaceEx. Should we encounter errors
or fail to resolve a valid, non-Tailscale interface, we fall back to
returning the index for the default interface instead.
Fixes#12551
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Timeouts could already be identified as NaN values on
stunstamp_derp_stun_rtt_ns, but we can't use NaN effectively with
promql to visualize them. So, this commit adds a timeouts metric that
we can use with rate/delta/etc promql functions.
Updates tailscale/corp#20689
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This actually performs a Noise request in the 'debug ts2021' command,
instead of just exiting once we've dialed a connection. This can help
debug certain forms of captive portals and deep packet inspection that
will allow a connection, but will RST the connection when trying to send
data on the post-upgraded TCP connection.
Updates #1634
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1e46ca9c9a0751c55f16373a6a76cdc24fec1f18
So that it can be later used in the 'tailscale debug ts2021' function in
the CLI, to aid in debugging captive portals/WAFs/etc.
Updates #1634
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Iec9423f5e7570f2c2c8218d27fc0902137e73909
Updates tailscale/corp#20969
Right now, when netcheck starts, it asks tailscaled for a copy of the DERPMap. If it doesn't have one, it makes a HTTPS request to controlplane.tailscale.com to fetch one.
This will always fail if you're on a network with a captive portal actively blocking HTTPS traffic. The code appears to hang entirely because the http.Client doesn't have a Timeout set. It just sits there waiting until the request succeeds or fails.
This adds a timeout of 10 seconds, and logs more details about the status of the HTTPS request.
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
This is useful during maintenance as a method for shedding home client
load.
Updates tailscale/corp#20689
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Adds a new TailscaleProxyReady condition type for use in corev1.Service
conditions.
Also switch our CRDs to use metav1.Condition instead of
ConnectorCondition. The Go structs are seralized identically, but it
updates some descriptions and validation rules. Update k8s
controller-tools and controller-runtime deps to fix the documentation
generation for metav1.Condition so that it excludes comments and
TODOs.
Stop expecting the fake client to populate TypeMeta in tests. See
kubernetes-sigs/controller-runtime#2633 for details of the change.
Finally, make some minor improvements to validation for service hostnames.
Fixes#12216
Co-authored-by: Irbe Krumina <irbe@tailscale.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Previously, we were registering TCP and UDP connections in the same map,
which could result in erroneously removing a mapping if one of the two
connections completes while the other one is still active.
Add a "proto string" argument to these functions to avoid this.
Additionally, take the "proto" argument in LocalAPI, and plumb that
through from the CLI and add a new LocalClient method.
Updates tailscale/corp#20600
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I35d5efaefdfbf4721e315b8ca123f0c8af9125fb
* cmd/containerboot: store device ID before setting up proxy routes.
For containerboot instances whose state needs to be stored
in a Kubernetes Secret, we additonally store the device's
ID, FQDN and IPs.
This is used, between other, by the Kubernetes operator,
who uses the ID to delete the device when resources need
cleaning up and writes the FQDN and IPs on various kube
resource statuses for visibility.
This change shifts storing device ID earlier in the proxy setup flow,
to ensure that if proxy routing setup fails,
the device can still be deleted.
Updates tailscale/tailscale#12146
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
* code review feedback
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
---------
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
We do not support specific version updates or track switching on macOS.
Do not populate the flag to avoid confusion.
Updates #cleanup
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
This moves NewContainsIPFunc from tsaddr to new ipset package.
And wgengine/filter types gets split into wgengine/filter/filtertype,
so netmap (and thus the CLI, etc) doesn't need to bring in ipset,
bart, etc.
Then add a test making sure the CLI deps don't regress.
Updates #1278
Change-Id: Ia246d6d9502bbefbdeacc4aef1bed9c8b24f54d5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
NewContainsIPFunc was previously documented as performing poorly if
there were many netip.Prefixes to search over. As such, we never it used it
in such cases.
This updates it to use bart at a certain threshold (over 6 prefixes,
currently), at which point the bart lookup overhead pays off.
This is currently kinda useless because we're not using it. But now we
can and get wins elsewhere. And we can remove the caveat in the docs.
goos: darwin
goarch: arm64
pkg: tailscale.com/net/tsaddr
│ before │ after │
│ sec/op │ sec/op vs base │
NewContainsIPFunc/empty-8 2.215n ± 11% 2.239n ± 1% +1.08% (p=0.022 n=10)
NewContainsIPFunc/cidr-list-1-8 17.44n ± 0% 17.59n ± 6% +0.89% (p=0.000 n=10)
NewContainsIPFunc/cidr-list-2-8 27.85n ± 0% 28.13n ± 1% +1.01% (p=0.000 n=10)
NewContainsIPFunc/cidr-list-3-8 36.05n ± 0% 36.56n ± 13% +1.41% (p=0.000 n=10)
NewContainsIPFunc/cidr-list-4-8 43.73n ± 0% 44.38n ± 1% +1.50% (p=0.000 n=10)
NewContainsIPFunc/cidr-list-5-8 51.61n ± 2% 51.75n ± 0% ~ (p=0.101 n=10)
NewContainsIPFunc/cidr-list-10-8 95.65n ± 0% 68.92n ± 0% -27.94% (p=0.000 n=10)
NewContainsIPFunc/one-ip-8 4.466n ± 0% 4.469n ± 1% ~ (p=0.491 n=10)
NewContainsIPFunc/two-ip-8 8.002n ± 1% 7.997n ± 4% ~ (p=0.697 n=10)
NewContainsIPFunc/three-ip-8 27.98n ± 1% 27.75n ± 0% -0.82% (p=0.012 n=10)
geomean 19.60n 19.07n -2.71%
Updates #12486
Change-Id: I2e2320cc4384f875f41721374da536bab995c1ce
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This abstraction provides a nicer way to work with
maps of slices without having to write out three long type
params.
This also allows it to provide an AsMap implementation which
copies the map and the slices at least.
Updates tailscale/corp#20910
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Without this rule, Windows 8.1 and newer devices issue parallel DNS requests to DNS servers
associated with all network adapters, even when "Override local DNS" is enabled and/or
a Mullvad exit node is being used, resulting in DNS leaks.
This also adds "disable-local-dns-override-via-nrpt" nodeAttr that can be used to disable
the new behavior if needed.
Fixestailscale/corp#20718
Signed-off-by: Nick Khyl <nickk@tailscale.com>
S4U logons do not automatically load the associated user profile. In this
PR we add UserProfile to handle that part. Windows docs indicate that
we should try to resolve a remote profile path when present, so we attempt
to do so when the local computer is joined to a domain.
Updates #12383
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This commit introduces a userspace program for managing an experimental
eBPF XDP STUN server program. derp/xdp contains the eBPF pseudo-C along
with a Go pkg for loading it and exporting its metrics.
cmd/xdpderper is a package main user of derp/xdp.
Updates tailscale/corp#20689
Signed-off-by: Jordan Whited <jordan@tailscale.com>
When we're starting child processes on Windows that are CLI programs that
don't need to output to a console, we should pass in DETACHED_PROCESS as a
CreationFlag on SysProcAttr. This prevents the OS from even creating a console
for the child (and paying the associated time/space penalty for new conhost
processes). This is more efficient than letting the OS create the console
window and then subsequently trying to hide it, which we were doing at a few
callsites.
Fixes#12270
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Add a new TS_EXPERIMENTAL_ENABLE_FORWARDING_OPTIMIZATIONS env var
that can be set for tailscale/tailscale container running as
a subnet router or exit node to enable UDP GRO forwarding
for improved performance.
See https://tailscale.com/kb/1320/performance-best-practices#linux-optimizations-for-subnet-routers-and-exit-nodes
This is currently considered an experimental approach;
the configuration support is partially to allow further experimentation
with containerized environments to evaluate the performance
improvements.
Updates tailscale/tailscale#12295
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Return empty response and NOERROR for AAAA record queries
for DNS names for which we have an A record.
This is to allow for callers that might be first sending an AAAA query and then,
if that does not return a response, follow with an A record query.
Previously we were returning NOTIMPL that caused some callers
to potentially not follow with an A record query or misbehave in different ways.
Also return NXDOMAIN for AAAA record queries for names
that we DO NOT have an A record for to ensure that the callers
do not follow up with an A record query.
Returning an empty response and NOERROR is the behaviour
that RFC 4074 recommends:
https://datatracker.ietf.org/doc/html/rfc4074
Updates tailscale/tailscale#12321
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
AllocateContiguousBuffer is for allocating structs with trailing buffers
containing additional data. It is to be used for various Windows structures
containing pointers to data located immediately after the struct.
SetNTString performs in-place setting of windows.NTString and
windows.NTUnicodeString.
Updates #12383
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This PR is in prep of adding logic to control to be able to parse
tailscale.com/cap/kubernetes grants in control:
- moves the type definition of PeerCapabilityKubernetes cap to a location
shared with control.
- update the Kubernetes cap rule definition with fields for granting
kubectl exec session recording capabilities.
- adds a convenience function to produce tailcfg.RawMessage from an
arbitrary cap rule and a test for it.
An example grant defined via ACLs:
"grants": [{
"src": ["tag:eng"],
"dst": ["tag:k8s-operator"],
"app": {
"tailscale.com/cap/kubernetes": [{
"recorder": ["tag:my-recorder"]
“enforceRecorder”: true
}],
},
}
]
This grant enforces `kubectl exec` sessions from tailnet clients,
matching `tag:eng` via API server proxy matching `tag:k8s-operator`
to be recorded and recording to be sent to a tsrecorder instance,
matching `tag:my-recorder`.
The type needs to be shared with control because we want
control to parse this cap and resolve tags to peer IPs.
Updates tailscale/corp#19821
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Add a new .spec.tailscale.acceptRoutes field to ProxyClass,
that can be optionally set to true for the proxies to
accept routes advertized by other nodes on tailnet (equivalent of
setting --accept-routes to true).
Updates tailscale/tailscale#12322,tailscale/tailscale#10684
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Also removes hardcoded image repo/tag from example DNSConfig resource
as the operator now knows how to default those.
Updates tailscale/tailscale#11019
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Add new fields TailnetIPs and Hostname to Connector Status. These
contain the addresses of the Tailscale node that the operator created
for the Connector to aid debugging.
Fixes#12214
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
cmd/k8s-operator,k8s-operator,go.{mod,sum}: make individual proxy images/image pull policies configurable
Allow to configure images and image pull policies for individual proxies
via ProxyClass.Spec.StatefulSet.Pod.{TailscaleContainer,TailscaleInitContainer}.Image,
and ProxyClass.Spec.StatefulSet.Pod.{TailscaleContainer,TailscaleInitContainer}.ImagePullPolicy
fields.
Document that we have images in ghcr.io on the relevant Helm chart fields.
Updates tailscale/tailscale#11675
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
To make it easier for people to monitor their custom DERP fleet.
Updates tailscale/corp#20654
Change-Id: Id8af22936a6d893cc7b6186d298ab794a2672524
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Evaluation of remote write errors was using errors.Is() where it should
have been using errors.As().
Updates tailscale/corp#20344
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This is done in preparation for adding kubectl
session recording rules to this capability grant that will need to
be unmarshalled by control, so will also need to be
in a shared location.
Updates tailscale/corp#19821
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
cmd/k8s-operator/deploy/chart: Support image 'repo' or 'repository' keys in helm values
Fixes#12100
Signed-off-by: Michael Long <michaelongdev@gmail.com>
This adds a new prototype `cmd/natc` which can be used
to expose a services/domains to the tailnet.
It requires the user to specify a set of IPv4 prefixes
from the CGNAT range. It advertises these as normal subnet
routes. It listens for DNS on the first IP of the first range
provided to it.
When it gets a DNS query it allocates an IP for that domain
from the v4 range. Subsequent connections to the assigned IP
are then tcp proxied to the domain.
It is marked as a WIP prototype and requires the use of the
`TAILSCALE_USE_WIP_CODE` env var.
Updates tailscale/corp#20503
Signed-off-by: Maisem Ali <maisem@tailscale.com>
stunstamp timestamping includes userspace and SO_TIMESTAMPING kernel
timestamping where available. Measurements are written locally to a
sqlite DB, exposed over an HTTP API, and written to prometheus
via remote-write protocol.
Updates tailscale/corp#20344
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This adds a new ListenPacket function on tsnet.Server
which acts mostly like `net.ListenPacket`.
Unlike `Server.Listen`, this requires listening on a
specific IP and does not automatically listen on both
V4 and V6 addresses of the Server when the IP is unspecified.
To test this, it also adds UDP support to tsdial.Dialer.UserDial
and plumbs it through the localapi. Then an associated test
to make sure the UDP functionality works from both sides.
Updates #12182
Signed-off-by: Maisem Ali <maisem@tailscale.com>
This fixes an issue where, on containerized environments an upgrade
1.66.3 -> 1.66.4 failed with default containerboot configuration.
This was because containerboot by default runs 'tailscale up'
that requires all previously set flags to be explicitly provided
on subsequent runs and we explicitly set --stateful-filtering
to true on 1.66.3, removed that settingon 1.66.4.
Updates tailscale/tailscale#12307
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Co-authored-by: Andrew Lytvynov <awly@tailscale.com>