Commit Graph

1495 Commits (main)

Author SHA1 Message Date
Percy Wegmann c8e912896e wgengine/router: consolidate routes before reconfiguring router for mobile clients
This helps reduce memory pressure on tailnets with large numbers
of routes.

Updates tailscale/corp#19332

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2 days ago
Irbe Krumina 3af0f526b8
cmd{containerboot,k8s-operator},util/linuxfw: support ExternalName Services (#11802)
* cmd/containerboot,util/linuxfw: support proxy backends specified by DNS name

Adds support for optionally configuring containerboot to proxy
traffic to backends configured by passing TS_EXPERIMENTAL_DEST_DNS_NAME env var
to containerboot.
Containerboot will periodically (every 10 minutes) attempt to resolve
the DNS name and ensure that all traffic sent to the node's
tailnet IP gets forwarded to the resolved backend IP addresses.

Currently:
- if the firewall mode is iptables, traffic will be load balanced
accross the backend IP addresses using round robin. There are
no health checks for whether the IPs are reachable.
- if the firewall mode is nftables traffic will only be forwarded
to the first IP address in the list. This is to be improved.

* cmd/k8s-operator: support ExternalName Services

 Adds support for exposing endpoints, accessible from within
a cluster to the tailnet via DNS names using ExternalName Services.
This can be done by annotating the ExternalName Service with
tailscale.com/expose: "true" annotation.
The operator will deploy a proxy configured to route tailnet
traffic to the backend IPs that service.spec.externalName
resolves to. The backend IPs must be reachable from the operator's
namespace.

Updates tailscale/tailscale#10606

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 days ago
Nick Khyl 9e1c86901b wgengine\router: fix the Tailscale-In firewall rule to work on domain networks
The Network Location Awareness service identifies networks authenticated against
an Active Directory domain and categorizes them as "Domain Authenticated".
This includes the Tailscale network if a Domain Controller is reachable through it.

If a network is categories as NLM_NETWORK_CATEGORY_DOMAIN_AUTHENTICATED,
it is not possible to override its category, and we shouldn't attempt to do so.
Additionally, our Windows Firewall rules should be compatible with both private
and domain networks.

This fixes both issues.

Fixes #11813

Signed-off-by: Nick Khyl <nickk@tailscale.com>
6 days ago
Brad Fitzpatrick 03d5d1f0f9 wgengine/magicsock: disable portmapper in tunchan-faked tests
Most of the magicsock tests fake the network, simulating packets going
out and coming in. There's no reason to actually hit your router to do
UPnP/NAT-PMP/PCP during in tests. But while debugging thousands of
iterations of tests to deflake some things, I saw it slamming my
router. This stops that.

Updates #11762

Change-Id: I59b9f48f8f5aff1fa16b4935753d786342e87744
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 week ago
Brad Fitzpatrick 7c1d6e35a5 all: use Go 1.22 range-over-int
Updates #11058

Change-Id: I35e7ef9b90e83cac04ca93fd964ad00ed5b48430
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 week ago
Brad Fitzpatrick 0fba9e7570 cmd/tailscale/cli: prevent concurrent Start calls in 'up'
Seems to deflake tstest/integration tests. I can't reproduce it
anymore on one of my VMs that was consistently flaking after a dozen
runs before. Now I can run hundreds of times.

Updates #11649
Fixes #7036

Change-Id: I2f7d4ae97500d507bdd78af9e92cd1242e8e44b8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 week ago
Brad Fitzpatrick dd6c76ea24 ipn: remove unused Options.LegacyMigrationPrefs
I'm on a mission to simplify LocalBackend.Start and its locking
and deflake some tests.

I noticed this hasn't been used since March 2023 when it was removed
from the Windows client in corp 66be796d33c.

So, delete.

Updates #11649

Change-Id: I40f2cb75fb3f43baf23558007655f65a8ec5e1b2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 week ago
Claire Wang 9171b217ba
cmd/tailscale, ipn/ipnlocal: add suggest exit node CLI option (#11407)
Updates tailscale/corp#17516

Signed-off-by: Claire Wang <claire@tailscale.com>
1 week ago
Charlotte Brandhorst-Satzkorn 449f46c207
wgengine/magicsock: rebind/restun if a syscall.EPERM error is returned (#11711)
We have seen in macOS client logs that the "operation not permitted", a
syscall.EPERM error, is being returned when traffic is attempted to be
sent. This may be caused by security software on the client.

This change will perform a rebind and restun if we receive a
syscall.EPERM error on clients running darwin. Rebinds will only be
called if we haven't performed one specifically for an EPERM error in
the past 5 seconds.

Updates #11710

Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
1 week ago
Brad Fitzpatrick 952e06aa46 wgengine/router: don't attempt route cleanup on Synology
Trying to run iptables/nftables on Synology pauses for minutes with
lots of errors and ultimately does nothing as it's not used and we
lack permissions.

This fixes a regression from db760d0bac (#11601) that landed
between Synology testing on unstable 1.63.110 and 1.64.0 being cut.

Fixes #11737

Change-Id: Iaf9563363b8e45319a9b6fe94c8d5ffaecc9ccef
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 week ago
James Tucker a2eb1c22b0 wgengine/magicsock: allow disco communication without known endpoints
Just because we don't have known endpoints for a peer does not mean that
the peer should become unreachable. If we know the peers key, it should
be able to call us, then we can talk back via whatever path it called us
on. First step - don't drop the packet in this context.

Updates tailscale/corp#19106

Signed-off-by: James Tucker <james@tailscale.com>
2 weeks ago
James Tucker db760d0bac cmd/tailscaled: move cleanup to an implicit action during startup
This removes a potentially increased boot delay for certain boot
topologies where they block on ExecStartPre that may have socket
activation dependencies on other system services (such as
systemd-resolved and NetworkManager).

Also rename cleanup to clean up in affected/immediately nearby places
per code review commentary.

Fixes #11599

Signed-off-by: James Tucker <james@tailscale.com>
2 weeks ago
Maisem Ali 3f4c5daa15 wgengine/netstack: remove SubnetRouterWrapper
It was used when we only supported subnet routers on linux
and would nil out the SubnetRoutes slice as no other router
worked with it, but now we support subnet routers on ~all platforms.

The field it was setting to nil is now only used for network logging
and nowhere else, so keep the field but drop the SubnetRouterWrapper
as it's not useful.

Updates #cleanup

Change-Id: Id03f9b6ec33e47ad643e7b66e07911945f25db79
Signed-off-by: Maisem Ali <maisem@tailscale.com>
3 weeks ago
James Tucker 6e334e64a1 net/netcheck,wgengine/magicsock: align DERP frame receive time heuristics
The netcheck package and the magicksock package coordinate via the
health package, but both sides have time based heuristics through
indirect dependencies. These were misaligned, so the implemented
heuristic aimed at reducing DERP moves while there is active traffic
were non-operational about 3/5ths of the time.

It is problematic to setup a good test for this integration presently,
so instead I added comment breadcrumbs along with the initial fix.

Updates #8603

Signed-off-by: James Tucker <james@tailscale.com>
3 weeks ago
Joonas Kuorilehto fe0cfec4ad wgengine/router: enable ip forwarding on gokrazy
Only on Gokrazy, set sysctls to enable IP forwarding so subnet routing
and advertised exit node works.

Fixes #11405

Signed-off-by: Joonas Kuorilehto <joneskoo@derbian.fi>
3 weeks ago
Percy Wegmann 853e3e29a0 wgengine/router: provide explicit hook to signal Android when VPN needs to be reconfigured
This allows clients to avoid establishing their VPN multiple times when
both routes and DNS are changing in rapid succession.

Updates tailscale/corp#18928

Signed-off-by: Percy Wegmann <percy@tailscale.com>
3 weeks ago
Charlotte Brandhorst-Satzkorn 93618a3518
tailscale: update tailfs functions and vars to use drive naming (#11597)
This change updates all tailfs functions and the majority of the tailfs
variables to use the new drive naming.

Updates tailscale/corp#16827

Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
3 weeks ago
Charlotte Brandhorst-Satzkorn 14683371ee
tailscale: update tailfs file and package names (#11590)
This change updates the tailfs file and package names to their new
naming convention.

Updates #tailscale/corp#16827

Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
3 weeks ago
Irbe Krumina 5fb721d4ad
util/linuxfw,wgengine/router: skip IPv6 firewall configuration in partial iptables mode (#11546)
We have hosts that support IPv6, but not IPv6 firewall configuration
in iptables mode.
We also have hosts that have some support for IPv6 firewall
configuration in iptables mode, but do not have iptables filter table.
We should:
- configure ip rules for all hosts that support IPv6
- only configure firewall rules in iptables mode if the host
has iptables filter table.

Updates tailscale/tailscale#11540

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
4 weeks ago
Brad Fitzpatrick a36cfb4d3d tailcfg, ipn/ipnlocal, wgengine/magicsock: add only-tcp-443 node attr
Updates tailscale/corp#17879

Change-Id: I0dc305d147b76c409cf729b599a94fa723aef0e0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
James Tucker 3f7313dbdb util/linuxfw,wgengine/router: enable IPv6 configuration when netfilter is disabled
Updates #11434

Signed-off-by: James Tucker <james@tailscale.com>
1 month ago
Joe Tsai 85febda86d
all: use zstdframe where sensible (#11491)
Use the zstdframe package where sensible instead of plumbing
around our own zstd.Encoder just for stateless operations.

This causes logtail to have a dependency on zstd,
but that's arguably okay since zstd support is implicit
to the protocol between a client and the logging service.
Also, virtually every caller to logger.NewLogger was
manually setting up a zstd.Encoder anyways,
meaning that zstd was functionally always a dependency.

Updates #cleanup
Updates tailscale/corp#18514

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
1 month ago
Brad Fitzpatrick 5d1c72f76b wgengine/magicsock: don't use endpoint debug ringbuffer on mobile.
Save some memory.

Updates tailscale/corp#18514

Change-Id: Ibcaf3c6d8e5cc275c81f04141d0f176e2249509b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Andrew Dunham 6da1dc84de wgengine: fix logger data race in tests
Observed in:
    https://github.com/tailscale/tailscale/actions/runs/8350904950/job/22858266932?pr=11463

Updates #11226

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I9b57db4b34b6ad91d240cd9fa7e344fc0376d52d
1 month ago
Andrew Dunham 7429e8912a wgengine/netstack: fix bug with duplicate SYN packets in client limit
This fixes a bug that was introduced in #11258 where the handling of the
per-client limit didn't properly account for the fact that the gVisor
TCP forwarder will return 'true' to indicate that it's handled a
duplicate SYN packet, but not launch the handler goroutine.

In such a case, we neither decremented our per-client limit in the
wrapper function, nor did we do so in the handler function, leading to
our per-client limit table slowly filling up without bound.

Fix this by doing the same duplicate-tracking logic that the TCP
forwarder does so we can detect such cases and appropriately decrement
our in-flight counter.

Updates tailscale/corp#12184

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib6011a71d382a10d68c0802593f34b8153d06892
2 months ago
Andrew Dunham f072d017bd wgengine/magicsock: don't change DERP home when not connected to control
This pretty much always results in an outage because peers won't
discover our new home region and thus won't be able to establish
connectivity.

Updates tailscale/corp#18095

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ic0d09133f198b528dd40c6383b16d7663d9d37a7
2 months ago
Andrew Dunham 62cf83eb92 go.mod: bump gvisor
The `stack.PacketBufferPtr` type no longer exists; replace it with
`*stack.PacketBuffer` instead.

Updates #8043

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib56ceff09166a042aa3d9b80f50b2aa2d34b3683
2 months ago
Andrew Dunham 4338db28f7 wgengine/magicsock: prefer link-local addresses to private ones
Since link-local addresses are definitionally more likely to be a direct
(lower-latency, more reliable) connection than a non-link-local private
address, give those a bit of a boost when selecting endpoints.

Updates #8097

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I93fdeb07de55ba39ba5fcee0834b579ca05c2a4e
2 months ago
Brad Fitzpatrick f18f591bc6 wgengine: plumb the PeerByKey from wgengine to magicsock
This was just added in 69f4b459 which doesn't yet use it. This still
doesn't yet use it. It just pushes it down deeper into magicsock where
it'll used later.

Updates #7617

Change-Id: If2f8fd380af150ffc763489e1ff4f8ca2899fac6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Percy Wegmann 2d5d6f5403 ipn,wgengine: only intercept TailFS traffic on quad 100
This fixes a regression introduced with 993acf4 and released in
v1.60.0.

The regression caused us to intercept all userspace traffic to port
8080 which prevented users from exposing their own services to their
tailnet at port 8080.

Now, we only intercept traffic to port 8080 if it's bound for
100.100.100.100 or fd7a:115c:a1e0::53.

Fixes #11283

Signed-off-by: Percy Wegmann <percy@tailscale.com>
(cherry picked from commit 17cd0626f3)
2 months ago
Brad Fitzpatrick 69f4b4595a wgengine{,/wgint}: add wgint.Peer wrapper type, add to wgengine.Engine
This adds a method to wgengine.Engine and plumbed down into magicsock
to add a way to get a type-safe Tailscale-safe wrapper around a
wireguard-go device.Peer that only exposes methods that are safe for
Tailscale to use internally.

It also removes HandshakeAttempts from PeerStatusLite that was just
added as it wasn't needed yet and is now accessible ala cart as needed
from the Peer type accessor.

None of this is used yet.

Updates #7617

Change-Id: I07be0c4e6679883e6eeddf8dbed7394c9e79c5f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick b4ff9a578f wgengine: rename local variable from 'found' to conventional 'ok'
Updates #cleanup

Change-Id: I799dc86ea9e4a3a949592abdd8e74282e7e5d086
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick a8a525282c wgengine: use slices.Clone in two places
Updates #cleanup

Change-Id: I1cb30efb6d09180e82b807d6146f37897ef99307
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 74b8985e19 ipn/ipnstate, wgengine: make PeerStatusLite.LastHandshake zero Time means none
... rather than 1970. Code was using IsZero against the 1970 team
(which isn't a zero value), but fortunately not anywhere that seems to
have mattered.

Updates #cleanup

Change-Id: I708a3f2a9398aaaedc9503678b4a8a311e0e019e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Andrew Dunham 3dd8ae2f26 net/tstun: fix spelling of "WireGuard"
Updates #cleanup

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ida7e30f4689bc18f5f7502f53a0adb5ac3c7981a
2 months ago
Andrew Dunham c5abbcd4b4 wgengine/netstack: add a per-client limit for in-flight TCP forwards
This is a fun one. Right now, when a client is connecting through a
subnet router, here's roughly what happens:

1. The client initiates a connection to an IP address behind a subnet
   router, and sends a TCP SYN
2. The subnet router gets the SYN packet from netstack, and after
   running through acceptTCP, starts DialContext-ing the destination IP,
   without accepting the connection¹
3. The client retransmits the SYN packet a few times while the dial is
   in progress, until either...
4. The subnet router successfully establishes a connection to the
   destination IP and sends the SYN-ACK back to the client, or...
5. The subnet router times out and sends a RST to the client.
6. If the connection was successful, the client ACKs the SYN-ACK it
   received, and traffic starts flowing

As a result, the notification code in forwardTCP never notices when a
new connection attempt is aborted, and it will wait until either the
connection is established, or until the OS-level connection timeout is
reached and it aborts.

To mitigate this, add a per-client limit on how many in-flight TCP
forwarding connections can be in-progress; after this, clients will see
a similar behaviour to the global limit, where new connection attempts
are aborted instead of waiting. This prevents a single misbehaving
client from blocking all other clients of a subnet router by ensuring
that it doesn't starve the global limiter.

Also, bump the global limit again to a higher value.

¹ We can't accept the connection before establishing a connection to the
remote server since otherwise we'd be opening the connection and then
immediately closing it, which breaks a bunch of stuff; see #5503 for
more details.

Updates tailscale/corp#12184

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I76e7008ddd497303d75d473f534e32309c8a5144
2 months ago
Brad Fitzpatrick 1cf85822d0 ipn/ipnstate, wgengine/wgint: add handshake attempts accessors
Not yet used. This is being made available so magicsock/wgengine can
use it to ignore certain sends (UDP + DERP) later on at least mobile,
letting wireguard-go think it's doing its full attempt schedule, but
we can cut it short conditionally based on what we know from the
control plane.

Updates #7617

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: Ia367cf6bd87b2aeedd3c6f4989528acdb6773ca7
2 months ago
Brad Fitzpatrick eb28818403 wgengine: make pendOpen time later, after dup check
Otherwise on OS retransmits, we'd make redundant timers in Go's timer
heap that upon firing just do nothing (well, grab a mutex and check a
map and see that there's nothing to do).

Updates #cleanup

Change-Id: Id30b8b2d629cf9c7f8133a3f7eca5dc79e81facb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 219efebad4 wgengine: reduce critical section
No need to hold wgLock while using the device to LookupPeer;
that has its own mutex already.

Updates #cleanup

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: Ib56049fcc7163cf5a2c2e7e12916f07b4f9d67cb
2 months ago
Nick Khyl 7ef1fb113d cmd/tailscaled, ipn/ipnlocal, wgengine: shutdown tailscaled if wgdevice is closed
Tailscaled becomes inoperative if the Tailscale Tunnel wintun adapter is abruptly removed.
wireguard-go closes the device in case of a read error, but tailscaled keeps running.
This adds detection of a closed WireGuard device, triggering a graceful shutdown of tailscaled.
It is then restarted by the tailscaled watchdog service process.

Fixes #11222

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2 months ago
Anton Tolchanov cd9cf93de6 wgengine/netstack: expose TCP forwarder drops via clientmetrics
- add a clientmetric with a counter of TCP forwarder drops due to the
  max attempts;
- fix varz metric types, as they are all counters.

Updates #8210

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2 months ago
Brad Fitzpatrick e1bd7488d0 all: remove LenIter, use Go 1.22 range-over-int instead
Updates #11058
Updates golang/go#65685

Change-Id: Ibb216b346e511d486271ab3d84e4546c521e4e22
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 6ad6d6b252 wgengine/wglog: add TS_DEBUG_RAW_WGLOG envknob for raw wg logs
Updates #7617 (part of debugging it)

Change-Id: I1bcbdcf0f929e3bcf83f244b1033fd438aa6dac1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 8b9474b06a wgengine/wgcfg: don't send UAPI to disable keep-alives on new peers
That's already the default. Avoid the overhead of writing it on one
side and reading it on the other to do nothing.

Updates #cleanup (noticed while researching something else)

Change-Id: I449c88a022271afb9be5da876bfaf438fe5d3f58
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
James Tucker 131f9094fd wgengine/wglog: quieten WireGuard logs for allowedips
An increasing number of users have very large subnet route
configurations, which can produce very large amounts of log data when
WireGuard is reconfigured. The logs don't contain the actual routes, so
they're largely useless for diagnostics, so we'll just suppress them.

Fixes tailscale/corp#17532

Signed-off-by: James Tucker <james@tailscale.com>
2 months ago
Jason Barnett 4d668416b8 wgengine/router: fix ip rule restoration
Fixes #10857

Signed-off-by: Jason Barnett <J@sonBarnett.com>
2 months ago
Aaron Klotz f7acbefbbb wgengine/router: make the Windows ifconfig implementation reuse existing MibIPforwardRow2 when possible
Looking at profiles, we spend a lot of time in winipcfg.LUID.DeleteRoute
looking up the routing table entry for the provided RouteData.

But we already have the row! We previously obtained that data via the full
table dump we did in getInterfaceRoutes. We can make this a lot faster by
hanging onto a reference to the wipipcfg.MibIPforwardRow2 and executing
the delete operation directly on that.

Fixes #11123

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
2 months ago
Percy Wegmann c42a4e407a tailfs: listen for local clients only on 100.100.100.100
FileSystemForLocal was listening on the node's Tailscale address,
which potentially exposes the user's view of TailFS shares to other
Tailnet users. Remote nodes should connect to exported shares via
the peerapi.

This removes that code so that FileSystemForLocal is only avaialable
on 100.100.100.100:8080.

Updates tailscale/corp#16827

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2 months ago
Percy Wegmann ddcffaef7a tailfs: disable TailFSForLocal via policy
Adds support for node attribute tailfs:access. If this attribute is
not present, Tailscale will not accept connections to the local TailFS
server at 100.100.100.100:8080.

Updates tailscale/corp#16827

Signed-off-by: Percy Wegmann <percy@tailscale.com>
3 months ago
Percy Wegmann abab0d4197 tailfs: clean up naming and package structure
- Restyles tailfs -> tailFS
- Defines interfaces for main TailFS types
- Moves implemenatation of TailFS into tailfsimpl package

Updates tailscale/corp#16827

Signed-off-by: Percy Wegmann <percy@tailscale.com>
3 months ago