Commit Graph

8238 Commits (5b7303817eb5fe9d63b05a129be33d36d50301c7)
 

Author SHA1 Message Date
Joe Tsai 5b7303817e
syncs: allocate map with Map.WithLock (#13755)
One primary purpose of WithLock is to mutate the underlying map.
However, this can lead to a panic if it happens to be nil.
Thus, always allocate a map before passing it to f.

Updates tailscale/corp#11038

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 weeks ago
Brad Fitzpatrick c763b7a7db syncs: delete Map.Range, update callers to iterators
Updates #11038

Change-Id: I2819fed896cc4035aba5e4e141b52c12637373b1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
Percy Wegmann 2cadb80fb2 util/vizerror: add WrapWithMessage
Thus new function allows constructing vizerrors that combine a message
appropriate for display to users with a wrapped underlying error.

Updates tailscale/corp#23781

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2 weeks ago
Joe Tsai 910b4e8e6a
syncs: add iterators to Map (#13739)
Add Keys, Values, and All to iterate over
all keys, values, and entries, respectively.

Updates #11038

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 weeks ago
Irbe Krumina 89ee6bbdae
cmd/k8s-operator,k8s-operator/apis: set a readiness condition on egress Services for ProxyGroup (#13746)
cmd/k8s-operator,k8s-operator/apis: set a readiness condition on egress Services

Set a readiness condition on ExternalName Services that define a tailnet target
to route cluster traffic to via a ProxyGroup's proxies. The condition
is set to true if at least one proxy is currently set up to route.

Updates tailscale/tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 weeks ago
Brad Fitzpatrick 94c79659fa types/views: add iterators to the three Map view types
Their callers using Range are all kinda clunky feeling. Iterators
should make them more readable.

Updates #12912

Change-Id: I93461eba8e735276fda4a8558a4ae4bfd6c04922
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
Irbe Krumina f6d4d03355
cmd/k8s-operator: don't error out if ProxyClass for ProxyGroup not found. (#13736)
We don't need to error out and continuously reconcile if ProxyClass
has not (yet) been created, once it gets created the ProxyGroup
reconciler will get triggered.

Updates tailscale/tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 weeks ago
Irbe Krumina 60011e73b8
cmd/k8s-operator: fix Pod IP selection (#13743)
Ensure that .status.podIPs is used to select Pod's IP
in all reconcilers.

Updates tailscale/tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 weeks ago
Nick Khyl da40609abd util/syspolicy, ipn: add "tailscale debug component-logs" support
Fixes #13313
Fixes #12687

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Nick Khyl 29cf59a9b4 util/syspolicy/setting: update Snapshot to use Go 1.23 iterators
Updates #12912
Updates #12687

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Tom Proctor 07c157ee9f
cmd/k8s-operator: base ProxyGroup StatefulSet on common proxy.yaml definition (#13714)
As discussed in #13684, base the ProxyGroup's proxy definitions on the same
scaffolding as the existing proxies, as defined in proxy.yaml

Updates #13406

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2 weeks ago
Tom Proctor 83efadee9f
kube/egressservices: improve egress ports config readability (#13722)
Instead of converting our PortMap struct to a string during marshalling
for use as a key, convert the whole collection of PortMaps to a list of
PortMap objects, which improves the readability of the JSON config while
still keeping the data structure we need in the code.

Updates #13406

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2 weeks ago
Brad Fitzpatrick 841eaacb07 net/sockstats: quiet some log spam in release builds
Updates #13731

Change-Id: Ibee85426827ebb9e43a1c42a9c07c847daa50117
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
Irbe Krumina 861dc3631c
cmd/{k8s-operator,containerboot},kube/egressservices: fix Pod IP check for dual stack clusters (#13721)
Currently egress Services for ProxyGroup only work for Pods and Services
with IPv4 addresses. Ensure that it works on dual stack clusters by reading
proxy Pod's IP from the .status.podIPs list that always contains both
IPv4 and IPv6 address (if the Pod has them) rather than .status.podIP that
could contain IPv6 only for a dual stack cluster.

Updates tailscale/tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 weeks ago
Andrew Dunham 8ee7f82bf4 net/netcheck: don't panic if a region has no Nodes
Updates #13728

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1e8319d6b2da013ae48f15113b30c9333e69cc0b
2 weeks ago
Tom Proctor 36cb2e4e5f
cmd/k8s-operator,k8s-operator: use default ProxyClass if set for ProxyGroup (#13720)
The default ProxyClass can be set via helm chart or env var, and applies
to all proxies that do not otherwise have an explicit ProxyClass set.
This ensures proxies created by the new ProxyGroup CRD are consistent
with the behaviour of existing proxies

Nearby but unrelated changes:

* Fix up double error logs (controller runtime logs returned errors)
* Fix a couple of variable names

Updates #13406

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2 weeks ago
Tom Proctor cba2e76568
cmd/containerboot: simplify k8s setup logic (#13627)
Rearrange conditionals to reduce indentation and make it a bit easier to read
the logic. Also makes some error message updates for better consistency
with the recent decision around capitalising resource names and the
upcoming addition of config secrets.

Updates #cleanup

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2 weeks ago
dependabot[bot] 866714a894
.github: Bump github/codeql-action from 3.26.9 to 3.26.11 (#13710)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.9 to 3.26.11.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](461ef6c76d...6db8d6351f)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2 weeks ago
dependabot[bot] 266c14d6ca
.github: Bump actions/cache from 4.0.2 to 4.1.0 (#13711)
Bumps [actions/cache](https://github.com/actions/cache) from 4.0.2 to 4.1.0.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](0c45773b62...2cdf405574)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2 weeks ago
Nick Hill 9a73462ea4 types/lazy: add DeferredInit type
It is sometimes necessary to defer initialization steps until the first actual usage
or until certain prerequisites have been met. For example, policy setting and
policy source registration should not occur during package initialization.
Instead, they should be deferred until the syspolicy package is actually used.
Additionally, any errors should be properly handled and reported, rather than
causing a panic within the package's init function.

In this PR, we add DeferredInit, to facilitate the registration and invocation
of deferred initialization functions.

Updates #12687

Signed-off-by: Nick Hill <mykola.khyl@gmail.com>
2 weeks ago
Brad Fitzpatrick f3de4e96a8 derp: fix omitted word in comment
Fix comment just added in 38f236c725.

Updates tailscale/corp#23668
Updates #cleanup

Change-Id: Icbe112e24fcccf8c61c759c631ad09f3e5480547
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
Irbe Krumina 7f016baa87
cmd/k8s-operator,k8s-operator: create ConfigMap for egress services + small fixes for egress services (#13715)
cmd/k8s-operator, k8s-operator: create ConfigMap for egress services + small reconciler fixes

Updates tailscale/tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 weeks ago
Brad Fitzpatrick 38f236c725 derp: add server metric for batch write sizes
Updates tailscale/corp#23668

Change-Id: Ie6268c4035a3b29fd53c072c5793e4cbba93d031
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
Erisa A c588c36233
types/key: use tlpub: in error message (#13707)
Fixes tailscale/corp#19442

Signed-off-by: Erisa A <erisa@tailscale.com>
2 weeks ago
Brad Fitzpatrick cb10eddc26 tool/gocross: fix argument order to find
To avoid warning:

    find: warning: you have specified the global option -maxdepth after the argument -type, but global options are not positional, i.e., -maxdepth affects tests specified before it as well as those specified after it.  Please specify global options before other arguments.

Fixes tailscale/corp#23689

Change-Id: I91ee260b295c552c0a029883d5e406733e081478
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 weeks ago
Tom Proctor e48cddfbb3
cmd/{containerboot,k8s-operator},k8s-operator,kube: add ProxyGroup controller (#13684)
Implements the controller for the new ProxyGroup CRD, designed for
running proxies in a high availability configuration. Each proxy gets
its own config and state Secret, and its own tailscale node ID.

We are currently mounting all of the config secrets into the container,
but will stop mounting them and instead read them directly from the kube
API once #13578 is implemented.

Updates #13406

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2 weeks ago
Brad Fitzpatrick 1005cbc1e4 tailscaleroot: panic if tailscale_go build tag but Go toolchain mismatch
Fixes #13527

Change-Id: I05921969a84a303b60d1b3b9227aff9865662831
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Brad Fitzpatrick c48cc08de2 wgengine: stop conntrack log spam about Canonical net probes
Like we do for the ones on iOS.

As a bonus, this removes a caller of tsaddr.IsTailscaleIP which we
want to revamp/remove soonish.

Updates #13687

Change-Id: Iab576a0c48e9005c7844ab52a0aba5ba343b750e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Andrew Dunham 12f1bc7c77 envknob: support disk-based envknobs on the macsys build
Per my investigation just now, the $HOME environment variable is unset
on the macsys (standalone macOS GUI) variant, but the current working
directory is valid. Look for the environment variable file in that
location in addition to inside the home directory.

Updates #3707

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I481ae2e0d19b316244373e06865e3b5c3a9f3b88
3 weeks ago
Patrick O'Doherty 4ad3f01225
safeweb: allow passing http.Server in safeweb.Config (#13688)
Extend safeweb.Config with the ability to pass a http.Server that
safeweb will use to server traffic.

Updates corp#8207

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
3 weeks ago
kari-ts 8fdffb8da0
hostinfo: update SetPackage doc with new Android values (#13537)
Fixes tailscale/corp#23283

Signed-off-by: kari-ts <kari@tailscale.com>
3 weeks ago
Erisa A f30d85310c
cmd/tailscale/cli: don't print disablement secrets if init fails (#13673)
* cmd/tailscale/cli: don't print disablement secrets if init fails

Fixes tailscale/corp#11355

Signed-off-by: Erisa A <erisa@tailscale.com>

* cmd/tailscale/cli: changes from code review

Signed-off-by: Erisa A <erisa@tailscale.com>

* cmd/tailscale/cli: small grammar change

Signed-off-by: Erisa A <erisa@tailscale.com>

---------

Signed-off-by: Erisa A <erisa@tailscale.com>
3 weeks ago
Irbe Krumina e8bb5d1be5
cmd/{k8s-operator,containerboot},k8s-operator,kube: reconcile ExternalName Services for ProxyGroup (#13635)
Adds a new reconciler that reconciles ExternalName Services that define a
tailnet target that should be exposed to cluster workloads on a ProxyGroup's
proxies.
The reconciler ensures that for each such service, the config mounted to
the proxies is updated with the tailnet target definition and that
and EndpointSlice and ClusterIP Service are created for the service.

Adds a new reconciler that ensures that as proxy Pods become ready to route
traffic to a tailnet target, the EndpointSlice for the target is updated
with the Pods' endpoints.

Updates tailscale/tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
3 weeks ago
Irbe Krumina 9bd158cc09
cmd/containerboot,util/linuxfw: create a SNAT rule for dst/src only once, clean up if needed (#13658)
The AddSNATRuleForDst rule was adding a new rule each time it was called including:
- if a rule already existed
- if a rule matching the destination, but with different desired source already existed

This was causing issues especially for the in-progress egress HA proxies work,
where the rules are now refreshed more frequently, so more redundant rules
were being created.

This change:
- only creates the rule if it doesn't already exist
- if a rule for the same dst, but different source is found, delete it
- also ensures that egress proxies refresh firewall rules
if the node's tailnet IP changes

Updates tailscale/tailscale#13406

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
3 weeks ago
Patrick O'Doherty a3c6a3a34f
safeweb: add StrictTransportSecurityOptions config (#13679)
Add the ability to specify Strict-Transport-Security options in response
to BrowserMux HTTP requests in safeweb.

Updates https://github.com/tailscale/corp/issues/23375

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
3 weeks ago
Brad Fitzpatrick dc60c8d786 ssh/tailssh: pass window size pixels in IoctlSetWinsize events
Fixes #13669

Change-Id: Id44cfbb83183f1bbcbdc38c29238287b9d288707
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Andrea Gottardo 58c6bc2991 logpolicy: force TLS 1.3 handshake
Updates tailscale/tailscale#3363

We know `log.tailscale.io` supports TLS 1.3, so we can enforce its usage in the client to shake some bytes off the TLS handshake each time a connection is opened to upload logs.

Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
3 weeks ago
Brad Fitzpatrick 5f88b65764 wgengine/netstack: check userspace ping success on Windows
Hacky temporary workaround until we do #13654 correctly.

Updates #13654

Change-Id: I764eaedbb112fb3a34dddb89572fec1b2543fd4a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Brad Fitzpatrick 1f8eea53a8 control/controlclient: include HTTP status string in error message too
Not just its code.

Updates tailscale/corp#23584

Change-Id: I8001a675372fe15da797adde22f04488d8683448
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Brad Fitzpatrick 6f694da912 wgengine/magicsock: avoid log spam from ReceiveFunc on shutdown
The new logging in 2dd71e64ac is spammy at shutdown:

    Receive func ReceiveIPv6 exiting with error: *net.OpError, read udp [::]:38869: raw-read udp6 [::]:38869: use of closed network connection
    Receive func ReceiveIPv4 exiting with error: *net.OpError, read udp 0.0.0.0:36123: raw-read udp4 0.0.0.0:36123: use of closed network connection

Skip it if we're in the process of shutting down.

Updates #10976

Change-Id: I4f6d1c68465557eb9ffe335d43d740e499ba9786
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Naman Sood 09ec2f39b5
tailcfg: add func to check for known valid ServiceProtos (#13668)
Updates tailscale/corp#23574.

Signed-off-by: Naman Sood <mail@nsood.in>
3 weeks ago
Brad Fitzpatrick 383120c534 ipn/ipnlocal: don't run portlist code unless service collection is on
We were selectively uploading it, but we were still gathering it,
which can be a waste of CPU.

Also remove a bunch of complexity that I don't think matters anymore.

And add an envknob to force service collection off on a single node,
even if the tailnet policy permits it.

Fixes #13463

Change-Id: Ib6abe9e29d92df4ffa955225289f045eeeb279cf
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Nick Khyl d837e0252f wf/firewall: allow link-local multicast for permitted local routes when the killswitch is on on Windows
When an Exit Node is used, we create a WFP rule to block all inbound and outbound traffic,
along with several rules to permit specific types of traffic. Notably, we allow all inbound and
outbound traffic to and from LocalRoutes specified in wgengine/router.Config. The list of allowed
routes always includes routes for internal interfaces, such as loopback and virtual Hyper-V/WSL2
interfaces, and may also include LAN routes if the "Allow local network access" option is enabled.
However, these permitting rules do not allow link-local multicast on the corresponding interfaces.
This results in broken mDNS/LLMNR, and potentially other similar issues, whenever an exit node is used.

In this PR, we update (*wf.Firewall).UpdatePermittedRoutes() to create rules allowing outbound and
inbound link-local multicast traffic to and from the permitted IP ranges, partially resolving the mDNS/LLMNR
and *.local name resolution issue.

Since Windows does not attempt to send mDNS/LLMNR queries if a catch-all NRPT rule is present,
it is still necessary to disable the creation of that rule using the disable-local-dns-override-via-nrpt nodeAttr.

Updates #13571

Signed-off-by: Nick Khyl <nickk@tailscale.com>
3 weeks ago
Brad Fitzpatrick b8af93310a tstest: add the start of a testing wishlist
Of tests we wish we could easily add. One day.

Updates #13038

Change-Id: If44646f8d477674bbf2c9a6e58c3cd8f94a4e8df
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Andrea Gottardo 6de6ab015f
net/dns: tweak DoH timeout, limit MaxConnsPerHost, require TLS 1.3 (#13564)
Updates tailscale/tailscale#6148

This is the result of some observations we made today with @raggi. The DNS over HTTPS client currently doesn't cap the number of connections it uses, either in-use or idle. A burst of DNS queries will open multiple connections. Idle connections remain open for 30 seconds (this interval is defined in the dohTransportTimeout constant). For DoH providers like NextDNS which send keep-alives, this means the cellular modem will remain up more than expected to send ACKs if any keep-alives are received while a connection remains idle during those 30 seconds. We can set the IdleConnTimeout to 10 seconds to ensure an idle connection is terminated if no other DNS queries come in after 10 seconds. Additionally, we can cap the number of connections to 1. This ensures that at all times there is only one open DoH connection, either active or idle. If idle, it will be terminated within 10 seconds from the last query.

We also observed all the DoH providers we support are capable of TLS 1.3. We can force this TLS version to reduce the number of packets sent/received each time a TLS connection is established.

Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
3 weeks ago
Brad Fitzpatrick a01b545441 control/control{client,http}: don't noise dial localhost:443 in http-only tests
1eaad7d3de regressed some tests in another repo that were starting up
a control server on `http://127.0.0.1:nnn`. Because there was no https
running, and because of a bug in 1eaad7d3de (which ended up checking
the recently-dialed-control check twice in a single dial call), we
ended up forcing only the use of TLS dials in a test that only had
plaintext HTTP running.

Instead, plumb down support for explicitly disabling TLS fallbacks and
use it only when running in a test and using `http` scheme control
plane URLs to 127.0.0.1 or localhost.

This fixes the tests elsewhere.

Updates #13597

Change-Id: I97212ded21daf0bd510891a278078daec3eebaa6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Brad Fitzpatrick 6b03e18975 control/controlhttp: rename a param from addr to optAddr for clarity
And update docs.

Updates #cleanup
Updates #13597 (tangentially; noted this cleanup while debugging)

Change-Id: I62440294c78b0bb3f5673be10318dd89af1e1bfe
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Brad Fitzpatrick f49d218cfe net/dnscache: don't fall back to an IPv6 dial if we don't have IPv6
I noticed while debugging a test failure elsewhere that our failure
logs (when verbosity is cranked up) were uselessly attributing dial
failures to failure to dial an invalid IP address (this IPv6 address
we didn't have), rather than showing me the actual IPv4 connection
failure.

Updates #13597 (tangentially)

Change-Id: I45ffbefbc7e25ebfb15768006413a705b941dae5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Brad Fitzpatrick 30f0fa95d9 control/controlclient: bound ReportHealthChange context lifetime to Direct client's
Fixes #13651

Change-Id: I8154d3cc0ca40fe7a0223b26ae2e77e8d6ba874b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Andrea Gottardo ed1ac799c8
net/captivedetection: set Timeout on net.Dialer (#13613)
Updates tailscale/tailscale#1634
Updates tailscale/tailscale#13265

Captive portal detection uses a custom `net.Dialer` in its `http.Client`. This custom Dialer ensures that the socket is bound specifically to the Wi-Fi interface. This is crucial because without it, if any default routes are set, the outgoing requests for detecting a captive portal would bypass Wi-Fi and go through the default route instead.

The Dialer did not have a Timeout property configured, so the default system timeout was applied. This caused issues in #13265, where we attempted to make captive portal detection requests over an IPsec interface used for Wi-Fi Calling. The call to `connect()` would fail and remain blocked until the system timeout (approximately 1 minute) was reached.

In #13598, I simply excluded the IPsec interface from captive portal detection. This was a quick and safe mitigation for the issue. This PR is a follow-up to make the process more robust, by setting a 3 seconds timeout on any connection establishment on any interface (this is the same timeout interval we were already setting on the HTTP client).

Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
3 weeks ago