Commit Graph

9970 Commits (main)
 

Author SHA1 Message Date
Tom Proctor f8cd07fb8a .github: make cigocacher script more robust
We got a flake in https://github.com/tailscale/tailscale/actions/runs/19867229792/job/56933249360
but it's not obvious to me where it failed. Make it more robust and
print out more useful error messages for next time.

Updates tailscale/corp#10808

Change-Id: I9ca08ea1103b9ad968c9cc0c42a493981ea62435
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
19 hours ago
Brad Fitzpatrick b8c58ca7c1 wgengine: fix TSMP/ICMP callback leak
Fixes #18112

Change-Id: I85d5c482b01673799d51faeb6cb0579903597502
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
20 hours ago
Gesa Stupperich 536188c1b5 tsnet: enable node registration via federated identity
Updates: tailscale.com/corp#34148

Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
22 hours ago
Joe Tsai 957a443b23
cmd/netlogfmt: allow empty --resolve-addrs flag (#18103)
Updates tailscale/corp#33352

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
22 hours ago
Raj Singh bd5c50909f
scripts/installer: add TAILSCALE_VERSION environment variable (#18014)
Add support for pinning specific Tailscale versions during installation
via the TAILSCALE_VERSION environment variable.

Example usage:
  curl -fsSL https://tailscale.com/install.sh | TAILSCALE_VERSION=1.88.4 sh

Fixes #17776

Signed-off-by: Raj Singh <raj@tailscale.com>
23 hours ago
Tom Proctor 22a815b6d2 tool: bump binaryen wasm optimiser version 111 -> 125
111 is 3 years old, and there have been a lot of speed improvements
since then. We run wasm-opt twice as part of the CI wasm job, and it
currently takes about 3 minutes each time. With 125, it takes ~40
seconds, a 4.5x speed-up.

Updates #cleanup

Change-Id: I671ae6cefa3997a23cdcab6871896b6b03e83a4f
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
24 hours ago
License Updater 8976b34cb8 licenses: update license notices
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
24 hours ago
Naasir 77dcdc223e cleanup: fix typos across multiple files
Does not affect code.

Updates #cleanup

Signed-off-by: Naasir <yoursdeveloper@protonmail.com>
1 day ago
Tom Proctor ece6e27f39 .github,cmd/cigocacher: use cigocacher for windows
Implements a new disk put function for cigocacher that does not cause
locking issues on Windows when there are multiple processes reading and
writing the same files concurrently. Integrates cigocacher into test.yml
for Windows where we are running on larger runners that support
connecting to private Azure vnet resources where cigocached is hosted.

Updates tailscale/corp#10808

Change-Id: I0d0e9b670e49e0f9abf01ff3d605cd660dd85ebb
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
1 day ago
Tom Proctor 97f1fd6d48 .github: only save cache on main
The cache artifacts from a full run of test.yml are 14GB. Only save
artifacts from the main branch to ensure we don't thrash too much. Most
branches should get decent performance with a hit from recent main.

Fixes tailscale/corp#34739

Change-Id: Ia83269d878e4781e3ddf33f1db2f21d06ea2130f
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
1 day ago
Shaikh Naasir 37b4dd047f
k8s-operator: Fix typos in egress-pod-readiness.go
Updates #cleanup

Signed-off-by: Alex Chan <alexc@tailscale.com>
2 days ago
Alex Chan bd12d8f12f cmd/tailscale/cli: soften the warning on `--force-reauth` for seamless
Thanks to seamless key renewal, you can now do a force-reauth without
losing your connection in all circumstances. We softened the interactive
warning (see #17262) so let's soften the help text as well.

Updates https://github.com/tailscale/corp/issues/32429

Signed-off-by: Alex Chan <alexc@tailscale.com>
2 days ago
Anton Tolchanov 34dff57137 feature/posture: log method and full URL for posture identity requests
Updates tailscale/corp#34676

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2 days ago
Fernando Serboncini f36eb81e61
cmd/k8s-operator fix populateTLSSecret on tests (#18088)
The call for populateTLSSecret was broken between PRs.

Updates #cleanup

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
5 days ago
Fernando Serboncini 7c5c02b77a
cmd/k8s-operator: add support for taiscale.com/http-redirect (#17596)
* cmd/k8s-operator: add support for taiscale.com/http-redirect

The k8s-operator now supports a tailscale.com/http-redirect annotation
on Ingress resources. When enabled, this automatically creates port 80
handlers that automatically redirect to the equivalent HTTPS location.

Fixes #11252

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>

* Fix for permanent redirect

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>

* lint

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>

* warn for redirect+endpoint

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>

* tests

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>

---------

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
5 days ago
Mario Minardi 411cee0dc9 .github/workflows: only run golang ci lint when go files have changed
Restrict running the golangci-lint workflow to when the workflow file
itself or a .go file, go.mod, or go.sum have actually been modified.

Updates #cleanup

Signed-off-by: Mario Minardi <mario@tailscale.com>
6 days ago
dependabot[bot] b40272e767 build(deps): bump braces from 3.0.2 to 3.0.3 in /client/web
Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3.
- [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md)
- [Commits](https://github.com/micromatch/braces/compare/3.0.2...3.0.3)

---
updated-dependencies:
- dependency-name: braces
  dependency-version: 3.0.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
6 days ago
dependabot[bot] 22bdf34a00 build(deps): bump cross-spawn from 7.0.3 to 7.0.6 in /client/web
Bumps [cross-spawn](https://github.com/moxystudio/node-cross-spawn) from 7.0.3 to 7.0.6.
- [Changelog](https://github.com/moxystudio/node-cross-spawn/blob/master/CHANGELOG.md)
- [Commits](https://github.com/moxystudio/node-cross-spawn/compare/v7.0.3...v7.0.6)

---
updated-dependencies:
- dependency-name: cross-spawn
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
6 days ago
dependabot[bot] c0c0d45114 build(deps-dev): bump vitest from 1.3.1 to 1.6.1 in /client/web
Bumps [vitest](https://github.com/vitest-dev/vitest/tree/HEAD/packages/vitest) from 1.3.1 to 1.6.1.
- [Release notes](https://github.com/vitest-dev/vitest/releases)
- [Commits](https://github.com/vitest-dev/vitest/commits/v1.6.1/packages/vitest)

---
updated-dependencies:
- dependency-name: vitest
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
6 days ago
dependabot[bot] 3e2476ec13 build(deps-dev): bump vite from 5.1.7 to 5.4.21 in /client/web
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 5.1.7 to 5.4.21.
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/v5.4.21/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v5.4.21/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-version: 5.4.21
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
6 days ago
dependabot[bot] 9500689bc1 build(deps): bump js-yaml from 4.1.0 to 4.1.1 in /client/web
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1.
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1)

---
updated-dependencies:
- dependency-name: js-yaml
  dependency-version: 4.1.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
6 days ago
Mario Minardi 9cc07bf9c0 .github/workflows: skip draft PRs for request review workflows
Skip the "request review" workflows for PRs that are in draft to reduce
noise / skip adding reviewers to PRs that are intentionally marked as
not ready to review.

Updates #cleanup

Signed-off-by: Mario Minardi <mario@tailscale.com>
7 days ago
Brad Fitzpatrick 74ed589042 syncs: add means of declare locking assumptions for debug mode validation
Updates #17852

Change-Id: I42a64a990dcc8f708fa23a516a40731a19967aba
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 days ago
Jonathan Nobels 3f9f0ed93c
VERSION.txt: this is v1.93.0 (#18074)
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
7 days ago
James Tucker 5ee0c6bf1d derp/derpserver: add a unique sender cardinality estimate
Adds an observation point that may identify potentially abusive traffic
patterns at outlier values.

Updates tailscale/corp#24681

Signed-off-by: James Tucker <james@tailscale.com>
7 days ago
Andrew Lytvynov 9eff8a4503
feature/tpm: return opening errors from both /dev/tpmrm0 and /dev/tpm0 (#18071)
This might help users diagnose why TPM access is failing for tpmrm0.

Fixes #18026

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
7 days ago
Brad Fitzpatrick 8af7778ce0 util/execqueue: don't hold mutex in RunSync
We don't hold q.mu while running normal ExecQueue.Add funcs, so we
shouldn't in RunSync either. Otherwise code it calls can't shut down
the queue, as seen in #18502.

Updates #18052

Co-authored-by: Nick Khyl <nickk@tailscale.com>
Change-Id: Ic5e53440411eca5e9fabac7f4a68a9f6ef026de1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 days ago
Alex Chan b7658a4ad2 tstest/integration: add integration test for Tailnet Lock
This patch adds an integration test for Tailnet Lock, checking that a node can't
talk to peers in the tailnet until it becomes signed.

This patch also introduces a new package `tstest/tkatest`, which has some helpers
for constructing a mock control server that responds to TKA requests. This allows
us to reduce boilerplate in the IPN tests.

Updates tailscale/corp#33599

Signed-off-by: Alex Chan <alexc@tailscale.com>
1 week ago
Jordan Whited 824027305a cmd/tailscale/cli,ipn,all: make peer relay server port a *uint16
In preparation for exposing its configuration via ipn.ConfigVAlpha,
change {Masked}Prefs.RelayServerPort from *int to *uint16. This takes a
defensive stance against invalid inputs at JSON decode time.

'tailscale set --relay-server-port' is currently the only input to this
pref, and has always sanitized input to fit within a uint16.

Updates tailscale/corp#34591

Signed-off-by: Jordan Whited <jordan@tailscale.com>
1 week ago
Sachin Iyer 53476ce872 ipn/serve: validate service paths in HasPathHandler
Fixes #17839

Signed-off-by: Sachin Iyer <siyer@detail.dev>
1 week ago
Claus Lensbøl c54d243690
net/tstun: add TSMPDiscoAdvertisement to TSMPPing (#17995)
Adds a new types of TSMP messages for advertising disco keys keys
to/from a peer, and implements the advertising triggered by a TSMP ping.

Needed as part of the effort to cache the netmap and still let clients
connect without control being reachable.

Updates #12639

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Co-authored-by: James Tucker <james@tailscale.com>
1 week ago
Alex Chan b38dd1ae06 ipn/ipnlocal: don't panic if there are no suitable exit nodes
In suggestExitNodeLocked, if no exit node candidates have a home DERP or
valid location info, `bestCandidates` is an empty slice. This slice is
passed to `selectNode` (`randomNode` in prod):

```go func randomNode(nodes views.Slice[tailcfg.NodeView], …) tailcfg.NodeView {
	…
	return nodes.At(rand.IntN(nodes.Len()))
}
```

An empty slice becomes a call to `rand.IntN(0)`, which panics.

This patch changes the behaviour, so if we've filtered out all the
candidates before calling `selectNode`, reset the list and then pick
from any of the available candidates.

This patch also updates our tests to give us more coverage of `randomNode`,
so we can spot other potential issues.

Updates #17661

Change-Id: I63eb5e4494d45a1df5b1f4b1b5c6d5576322aa72
Signed-off-by: Alex Chan <alexc@tailscale.com>
1 week ago
Fran Bull f4a4bab105 tsconsensus: skip integration tests in CI
There is an issue to add non-integration tests: #18022

Fixes #15627 #16340

Signed-off-by: Fran Bull <fran@tailscale.com>
1 week ago
Brad Fitzpatrick ac0b15356d tailcfg, control/controlclient: start moving MapResponse.DefaultAutoUpdate to a nodeattr
And fix up the TestAutoUpdateDefaults integration tests as they
weren't testing reality: the DefaultAutoUpdate is supposed to only be
relevant on the first MapResponse in the stream, but the tests weren't
testing that. They were instead injecting a 2nd+ MapResponse.

This changes the test control server to add a hook to modify the first
map response, and then makes the test control when the node goes up
and down to make new map responses.

Also, the test now runs on macOS where the auto-update feature being
disabled would've previously t.Skipped the whole test.

Updates #11502

Change-Id: If2319bd1f71e108b57d79fe500b2acedbc76e1a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 week ago
Simon Law 848978e664
ipn/ipnlocal: test traffic-steering when feature is not enabled (#17997)
In PR tailscale/corp#34401, the `traffic-steering` feature flag does
not automatically enable traffic steering for all nodes. Instead, an
admin must add the `traffic-steering` node attribute to each client
node that they want opted-in.

For backwards compatibility with older clients, tailscale/corp#34401
strips out the `traffic-steering` node attribute if the feature flag
is not enabled, even if it is set in the policy file. This lets us
safely disable the feature flag.

This PR adds a missing test case for suggested exit nodes that have no
priority.

Updates tailscale/corp#34399

Signed-off-by: Simon Law <sfllaw@tailscale.com>
1 week ago
Nick Khyl 7073f246d3 ipn/ipnlocal: do not call controlclient.Client.Shutdown with b.mu held
This fixes a regression in #17804 that caused a deadlock.

Updates #18052

Signed-off-by: Nick Khyl <nickk@tailscale.com>
1 week ago
David Bond d4821cdc2f
cmd/k8s-operator: allow HA ingresses to be deleted when VIP service does not exist (#18050)
This commit fixes a bug in our HA ingress reconciler where ingress resources would
be stuck in a deleting state should their associated VIP service be deleted within
control.

The reconciliation loop would check for the existence of the VIP service and if not
found perform no additional cleanup steps. The code has been modified to continue
onwards even if the VIP service is not found.

Fixes: https://github.com/tailscale/tailscale/issues/18049

Signed-off-by: David Bond <davidsbond93@gmail.com>
1 week ago
Simon Law 9c3a2aa797
ipn/ipnlocal: replace log.Printf with logf (#18045)
Updates #cleanup

Signed-off-by: Simon Law <sfllaw@tailscale.com>
1 week ago
Jordan Whited 7426eca163 cmd/tailscale,feature/relayserver,ipn: add relay-server-static-endpoints set flag
Updates tailscale/corp#31489
Updates #17791

Signed-off-by: Jordan Whited <jordan@tailscale.com>
1 week ago
Jordan Whited 755309c04e net/udprelay: use blake2s-256 MAC for handshake challenge
This commit replaces crypto/rand challenge generation with a blake2s-256
MAC. This enables the peer relay server to respond to multiple forward
disco.BindUDPRelayEndpoint messages per handshake generation without
sacrificing the proof of IP ownership properties of the handshake.

Responding to multiple forward disco.BindUDPRelayEndpoint messages per
handshake generation improves client address/path selection where
lowest client->server path/addr one-way delay does not necessarily
equate to lowest client<->server round trip delay.

It also improves situations where outbound traffic is filtered
independent of input, and the first reply
disco.BindUDPRelayEndpointChallenge message is dropped on the reply
path, but a later reply using a different source would make it through.

Reduction in serverEndpoint state saves 112 bytes per instance, trading
for slightly more expensive crypto ops: 277ns/op vs 321ns/op on an M1
Macbook Pro.

Updates tailscale/corp#34414

Signed-off-by: Jordan Whited <jordan@tailscale.com>
1 week ago
Tom Proctor 6637003cc8 cmd/cigocacher,go.mod: add cigocacher cmd
Adds cmd/cigocacher as the client to cigocached for Go caching over
HTTP. The HTTP cache is best-effort only, and builds will fall back to
disk-only cache if it's not available, much like regular builds.

Not yet used in CI; that will follow in another PR once we have runners
available in this repo with the right network setup for reaching
cigocached.

Updates tailscale/corp#10808

Change-Id: I13ae1a12450eb2a05bd9843f358474243989e967
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
1 week ago
Andrew Dunham 698eecda04 ipn/ipnlocal: fix panic in driveTransport on network error
When the underlying transport returns a network error, the RoundTrip
method returns (nil, error). The defer was attempting to access resp
without checking if it was nil first, causing a panic. Fix this by
checking for nil in the defer.

Also changes driveTransport.tr from *http.Transport to http.RoundTripper
and adds a test.

Fixes #17306

Signed-off-by: Andrew Dunham <andrew@tailscale.com>
Change-Id: Icf38a020b45aaa9cfbc1415d55fd8b70b978f54c
1 week ago
Andrew Dunham a20cdb5c93 tstest/integration/testcontrol: de-flake TestUserMetricsRouteGauges
SetSubnetRoutes was not sending update notifications to nodes when their
approved routes changed, causing nodes to not fetch updated netmaps with
PrimaryRoutes populated. This resulted in TestUserMetricsRouteGauges
flaking because it waited for PrimaryRoutes to be set, which only happened
if the node happened to poll for other reasons.

Now send updateSelfChanged notification to affected nodes so they fetch
an updated netmap immediately.

Fixes #17962

Signed-off-by: Andrew Dunham <andrew@tailscale.com>
1 week ago
Andrew Dunham 16587746ed portlist,tstest: skip tests on kernels with /proc/net/tcp regression
Linux kernel versions 6.6.102-104 and 6.12.42-45 have a regression
in /proc/net/tcp that causes seek operations to fail with "illegal seek".
This breaks portlist tests on these kernels.

Add kernel version detection for Linux systems and a SkipOnKernelVersions
helper to tstest. Use it to skip affected portlist tests on the broken
kernel versions.

Thanks to philiptaron for the list of kernels with the issue and fix.

Updates #16966

Signed-off-by: Andrew Dunham <andrew@tailscale.com>
2 weeks ago
Nick Khyl 1ccece0f78 util/eventbus: use unbounded event queues for DeliveredEvents in subscribers
Bounded DeliveredEvent queues reduce memory usage, but they can deadlock under load.
Two common scenarios trigger deadlocks when the number of events published in a short
period exceeds twice the queue capacity (there's a PublishedEvent queue of the same size):
 - a subscriber tries to acquire the same mutex as held by a publisher, or
 - a subscriber for A events publishes B events

Avoiding these scenarios is not practical and would limit eventbus usefulness and reduce its adoption,
pushing us back to callbacks and other legacy mechanisms. These deadlocks already occurred in customer
devices, dev machines, and tests. They also make it harder to identify and fix slow subscribers and similar
issues we have been seeing recently.

Choosing an arbitrary large fixed queue capacity would only mask the problem. A client running
on a sufficiently large and complex customer environment can exceed any meaningful constant limit,
since event volume depends on the number of peers and other factors. Behavior also changes
based on scheduling of publishers and subscribers by the Go runtime, OS, and hardware, as the issue
is essentially a race between publishers and subscribers. Additionally, on lower-end devices,
an unreasonably high constant capacity is practically the same as using unbounded queues.

Therefore, this PR changes the event queue implementation to be unbounded by default.
The PublishedEvent queue keeps its existing capacity of 16 items, while subscribers'
DeliveredEvent queues become unbounded.

This change fixes known deadlocks and makes the system stable under load,
at the cost of higher potential memory usage, including cases where a queue grows
during an event burst and does not shrink when load decreases.

Further improvements can be implemented in the future as needed.

Fixes #17973
Fixes #18012

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Jordan Whited 9245c7131b feature/relayserver: don't publish from within a subscribe fn goroutine
Updates #17830

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 weeks ago
Claus Lensbøl e7f5ca1d5e
wgengine/userspace: run link change subscribers in eventqueue (#18024)
Updates #17996

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2 weeks ago
Nick Khyl 3780f25d51 util/eventbus: add tests for a subscriber publishing events
As of 2025-11-20, publishing more events than the eventbus's
internal queues can hold may deadlock if a subscriber tries
to publish events itself.

This commit adds a test that demonstrates this deadlock,
and skips it until the bug is fixed.

Updates #18012

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Nick Khyl 016ccae2da util/eventbus: add tests for a subscriber trying to acquire the same mutex as a publisher
As of 2025-11-20, publishing more events than the eventbus's
internal queues can hold may deadlock if a subscriber tries
to acquire a mutex that can also be held by a publisher.

This commit adds a test that demonstrates this deadlock,
and skips it until the bug is fixed.

Updates #17973

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 weeks ago
Alex Chan ce95bc77fb tka: don't panic if no clock set in tka.Mem
This is causing confusing panics in tailscale/corp#34485. We'll keep
using the tka.ChonkMem constructor as much as we can, but don't panic
if you create a tka.Mem directly -- we know what the sensible thing is.

Updates #cleanup

Signed-off-by: Alex Chan <alexc@tailscale.com>

Change-Id: I49309f5f403fc26ce4f9a6cf0edc8eddf6a6f3a4
2 weeks ago