Commit Graph

546 Commits (0926954cf5866f9c2d1bd908d1218d86161adece)

Author SHA1 Message Date
Andrew Dunham e107977f75 wgengine/magicsock: disable SIO_UDP_NETRESET on Windows
By default, Windows sets the SIO_UDP_CONNRESET and SIO_UDP_NETRESET
options on created UDP sockets. These behaviours make the UDP socket
ICMP-aware; when the system gets an ICMP message (e.g. an "ICMP Port
Unreachable" message, in the case of SIO_UDP_CONNRESET), it will cause
the underlying UDP socket to throw an error. Confusingly, this can occur
even on reads, if the same UDP socket is used to write a packet that
triggers this response.

The Go runtime disabled the SIO_UDP_CONNRESET behavior in 3114bd6, but
did not change SIO_UDP_NETRESET–probably because that socket option
isn't documented particularly well.

Various other networking code seem to disable this behaviour, such as
the Godot game engine (godotengine/godot#22332) and the Eclipse TCF
agent (link below). Others appear to work around this by ignoring the
error returned (anacrolix/dht#16, among others).

For now, until it's clear whether this ends up in the upstream Go
implementation or not, let's also disable the SIO_UDP_NETRESET in a
similar manner to SIO_UDP_CONNRESET.

Eclipse TCF agent: https://gitlab.eclipse.org/eclipse/tcf/tcf.agent/-/blob/master/agent/tcf/framework/mdep.c

Updates #10976
Updates golang/go#68614

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I70a2f19855f8dec1bfb82e63f6d14fc4a22ed5c3
1 month ago
Brad Fitzpatrick 2dd71e64ac wgengine/magicsock: log when a ReceiveFunc fails
Updates #10976

Change-Id: I86d30151a25c7d42ed36e273fb207873f4acfdb4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Jordan Whited a93dc6cdb1
wgengine/magicsock: refactor batchingUDPConn to batchingConn interface (#13042)
This commit adds a batchingConn interface, and renames batchingUDPConn
to linuxBatchingConn. tryUpgradeToBatchingConn() may return a platform-
specific implementation of batchingConn. So far only a Linux
implementation of this interface exists, but this refactor is being
done in anticipation of a Windows implementation.

Updates tailscale/corp#21874

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Andrew Dunham 9939374c48 wgengine/magicsock: use cloud metadata to get public IPs
Updates #12774

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1661b6a2da7966ab667b075894837afd96f4742f
2 months ago
Irbe Krumina 57856fc0d5
ipn,wgengine/magicsock: allow setting static node endpoints via tailscaled configfile (#12882)
wgengine/magicsock,ipn: allow setting static node endpoints via tailscaled config file.

Adds a new StaticEndpoints field to tailscaled config
that can be used to statically configure the endpoints
that the node advertizes. This field will replace
TS_DEBUG_PRETENDPOINTS env var that can be used to achieve the same.

Additionally adds some functionality that ensures that endpoints
are updated when configfile is reloaded.

Also, refactor configuring/reconfiguring components to use the
same functionality when configfile is parsed the first time or
subsequent times (after reload). Previously a configfile reload
did not result in resetting of prefs. Now it does- but does not yet
tell the relevant components to consume the new prefs. This is to
be done in a follow-up.

Updates tailscale/tailscale#12578


Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2 months ago
Brad Fitzpatrick 808b4139ee wgengine/magicsock: use wireguard-go/conn.PeerAwareEndpoint
If we get an non-disco presumably-wireguard-encrypted UDP packet from
an IP:port we don't recognize, rather than drop the packet, give it to
WireGuard anyway and let WireGuard try to figure out who it's from and
tell us.

This uses the new hook added in https://github.com/tailscale/wireguard-go/pull/27

Updates tailscale/corp#20732

Change-Id: I5c61a40143810592f9efac6c12808a87f924ecf2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Lee Briggs b546a6e758 wgengine/magicsock: allow a CSV list for pretendpoint
Load Balancers often have more than one ingress IP, so allowing us to
add multiple means we can offer multiple options.

Updates #12578

Change-Id: I4aa49a698d457627d2f7011796d665c67d4c7952
Signed-off-by: Lee Briggs <lee@leebriggs.co.uk>
2 months ago
Brad Fitzpatrick 42dac7c5c2 wgengine/magicsock: add debug envknob for injecting an endpoint
For testing. Lee wants to play with 'AWS Global Accelerator Custom
Routing with Amazon Elastic Kubernetes Service'. If this works well
enough, we can promote it.

Updates #12578

Change-Id: I5018347ed46c15c9709910717d27305d0aedf8f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 9df107f4f0 wgengine/magicsock: use derp-region-as-magic-AddrPort hack in fewer places
And fix up a bogus comment and flesh out some other comments.

Updates #cleanup

Change-Id: Ia60a1c04b0f5e44e8d9587914af819df8e8f442a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Andrew Dunham 8487fd2ec2 wgengine/magicsock: add more DERP home clientmetrics
Updates tailscale/corp#18095

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I423adca2de0730092394bb5fd5796cd35557d352
3 months ago
James Tucker 9351eec3e1 net/netcheck: remove hairpin probes
Palo Alto reported interpreting hairpin probes as LAND attacks, and the
firewalls may be responding to this by shutting down otherwise in use NAT sessions
prematurely. We don't currently make use of the outcome of the hairpin
probes, and they contribute to other user confusion with e.g. the
AirPort Extreme hairpin session workaround. We decided in response to
remove the whole probe feature as a result.

Updates #188
Updates tailscale/corp#19106
Updates tailscale/corp#19116

Signed-off-by: James Tucker <james@tailscale.com>
4 months ago
James Tucker 8d1249550a net/netcheck,wgengine/magicsock: add potential workaround for Palo Alto DIPP misbehavior
Palo Alto firewalls have a typically hard NAT, but also have a mode
called Persistent DIPP that is supposed to provide consistent port
mapping suitable for STUN resolution of public ports. Persistent DIPP
works initially on most Palo Alto firewalls, but some models/software
versions have a bug which this works around.

The bug symptom presents as follows:

- STUN sessions resolve a consistent public IP:port to start with
- Much later netchecks report the same IP:Port for a subset of
  sessions, most often the users active DERP, and/or the port related
  to sustained traffic.
- The broader set of DERPs in a full netcheck will now consistently
  observe a new IP:Port.
- After this point of observation, new inbound connections will only
  succeed to the new IP:Port observed, and existing/old sessions will
  only work to the old binding.

In this patch we now advertise the lowest latency global endpoint
discovered as we always have, but in addition any global endpoints that
are observed more than once in a single netcheck report. This should
provide viable endpoints for potential connection establishment across
a NAT with this behavior.

Updates tailscale/corp#19106

Signed-off-by: James Tucker <james@tailscale.com>
4 months ago
Claire Wang e070af7414
ipnlocal, magicsock: add more description to storing last suggested exit (#11998)
node related functions
Updates tailscale/corp#19681

Signed-off-by: Claire Wang <claire@tailscale.com>
4 months ago
Claire Wang 35872e86d2
ipnlocal, magicsock: store last suggested exit node id in local backend (#11959)
Updates tailscale/corp#19681

Signed-off-by: Claire Wang <claire@tailscale.com>
5 months ago
Brad Fitzpatrick b9adbe2002 net/{interfaces,netmon}, all: merge net/interfaces package into net/netmon
In prep for most of the package funcs in net/interfaces to become
methods in a long-lived netmon.Monitor that can cache things.  (Many
of the funcs are very heavy to call regularly, whereas the long-lived
netmon.Monitor can subscribe to things from the OS and remember
answers to questions it's asked regularly later)

Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299

Change-Id: Ie4e8dedb70136af2d611b990b865a822cd1797e5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
5 months ago
Brad Fitzpatrick 7a62dddeac net/netcheck, wgengine/magicsock: make netmon.Monitor required
This has been a TODO for ages. Time to do it.

The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached.

Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299

Change-Id: I60fc6508cd2d8d079260bda371fc08b6318bcaf1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
5 months ago
Brad Fitzpatrick 7f587d0321 health, wgengine/magicsock: remove last of health package globals
Fixes #11874
Updates #4136

Change-Id: Ib70e6831d4c19c32509fe3d7eee4aa0e9f233564
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
5 months ago
Brad Fitzpatrick 6d69fc137f ipn/{ipnlocal,localapi},wgengine{,/magicsock}: plumb health.Tracker
Down to 25 health.Global users. After this remains controlclient &
net/dns & wgengine/router.

Updates #11874
Updates #4136

Change-Id: I6dd1856e3d9bf523bdd44b60fb3b8f7501d5dc0d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
5 months ago
Brad Fitzpatrick ebc552d2e0 health: add Tracker type, in prep for removing global variables
This moves most of the health package global variables to a new
`health.Tracker` type.

But then rather than plumbing the Tracker in tsd.System everywhere,
this only goes halfway and makes one new global Tracker
(`health.Global`) that all the existing callers now use.

A future change will eliminate that global.

Updates #11874
Updates #4136

Change-Id: I6ee27e0b2e35f68cb38fecdb3b2dc4c3f2e09d68
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
5 months ago
Brad Fitzpatrick 03d5d1f0f9 wgengine/magicsock: disable portmapper in tunchan-faked tests
Most of the magicsock tests fake the network, simulating packets going
out and coming in. There's no reason to actually hit your router to do
UPnP/NAT-PMP/PCP during in tests. But while debugging thousands of
iterations of tests to deflake some things, I saw it slamming my
router. This stops that.

Updates #11762

Change-Id: I59b9f48f8f5aff1fa16b4935753d786342e87744
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
5 months ago
Claire Wang 9171b217ba
cmd/tailscale, ipn/ipnlocal: add suggest exit node CLI option (#11407)
Updates tailscale/corp#17516

Signed-off-by: Claire Wang <claire@tailscale.com>
5 months ago
Charlotte Brandhorst-Satzkorn 449f46c207
wgengine/magicsock: rebind/restun if a syscall.EPERM error is returned (#11711)
We have seen in macOS client logs that the "operation not permitted", a
syscall.EPERM error, is being returned when traffic is attempted to be
sent. This may be caused by security software on the client.

This change will perform a rebind and restun if we receive a
syscall.EPERM error on clients running darwin. Rebinds will only be
called if we haven't performed one specifically for an EPERM error in
the past 5 seconds.

Updates #11710

Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
5 months ago
James Tucker a2eb1c22b0 wgengine/magicsock: allow disco communication without known endpoints
Just because we don't have known endpoints for a peer does not mean that
the peer should become unreachable. If we know the peers key, it should
be able to call us, then we can talk back via whatever path it called us
on. First step - don't drop the packet in this context.

Updates tailscale/corp#19106

Signed-off-by: James Tucker <james@tailscale.com>
5 months ago
Brad Fitzpatrick a36cfb4d3d tailcfg, ipn/ipnlocal, wgengine/magicsock: add only-tcp-443 node attr
Updates tailscale/corp#17879

Change-Id: I0dc305d147b76c409cf729b599a94fa723aef0e0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
6 months ago
Brad Fitzpatrick 5d1c72f76b wgengine/magicsock: don't use endpoint debug ringbuffer on mobile.
Save some memory.

Updates tailscale/corp#18514

Change-Id: Ibcaf3c6d8e5cc275c81f04141d0f176e2249509b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
6 months ago
Andrew Dunham f072d017bd wgengine/magicsock: don't change DERP home when not connected to control
This pretty much always results in an outage because peers won't
discover our new home region and thus won't be able to establish
connectivity.

Updates tailscale/corp#18095

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ic0d09133f198b528dd40c6383b16d7663d9d37a7
7 months ago
Brad Fitzpatrick 69f4b4595a wgengine{,/wgint}: add wgint.Peer wrapper type, add to wgengine.Engine
This adds a method to wgengine.Engine and plumbed down into magicsock
to add a way to get a type-safe Tailscale-safe wrapper around a
wireguard-go device.Peer that only exposes methods that are safe for
Tailscale to use internally.

It also removes HandshakeAttempts from PeerStatusLite that was just
added as it wasn't needed yet and is now accessible ala cart as needed
from the Peer type accessor.

None of this is used yet.

Updates #7617

Change-Id: I07be0c4e6679883e6eeddf8dbed7394c9e79c5f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick e1bd7488d0 all: remove LenIter, use Go 1.22 range-over-int instead
Updates #11058
Updates golang/go#65685

Change-Id: Ibb216b346e511d486271ab3d84e4546c521e4e22
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Jordan Whited 8b47322acc
wgengine/magicsock: implement probing of UDP path lifetime (#10844)
This commit implements probing of UDP path lifetime on the tail end of
an active direct connection. Probing configuration has two parts -
Cliffs, which are various timeout cliffs of interest, and
CycleCanStartEvery, which limits how often a probing cycle can start,
per-endpoint. Initially a statically defined default configuration will
be used. The default configuration has cliffs of 10s, 30s, and 60s,
with a CycleCanStartEvery of 24h. Probing results are communicated via
clientmetric counters. Probing is off by default, and can be enabled
via control knob. Probing is purely informational and does not yet
drive any magicsock behaviors.

Updates #540

Signed-off-by: Jordan Whited <jordan@tailscale.com>
8 months ago
Claire Wang 213d696db0
magicsock: mute noisy expected peer mtu related error (#10870) 8 months ago
Andrew Lytvynov 2716250ee8
all: cleanup unused code, part 2 (#10670)
And enable U1000 check in staticcheck.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
9 months ago
Andrew Dunham 727acf96a6 net/netcheck: use DERP frames as a signal for home region liveness
This uses the fact that we've received a frame from a given DERP region
within a certain time as a signal that the region is stil present (and
thus can still be a node's PreferredDERP / home region) even if we don't
get a STUN response from that region during a netcheck.

This should help avoid DERP flaps that occur due to losing STUN probes
while still having a valid and active TCP connection to the DERP server.

RELNOTE=Reduce home DERP flapping when there's still an active connection

Updates #8603

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If7da6312581e1d434d5c0811697319c621e187a0
9 months ago
Naman Sood d46a4eced5
util/linuxfw, wgengine: allow ingress to magicsock UDP port on Linux (#10370)
* util/linuxfw, wgengine: allow ingress to magicsock UDP port on Linux

Updates #9084.

Currently, we have to tell users to manually open UDP ports on Linux when
certain firewalls (like ufw) are enabled. This change automates the process of
adding and updating those firewall rules as magicsock changes what port it
listens on.

Signed-off-by: Naman Sood <mail@nsood.in>
10 months ago
Jordan Whited 1af7f5b549
wgengine/magicsock: fix typo in Conn.handlePingLocked() (#10365)
Updates #cleanup

Signed-off-by: Jordan Whited <jordan@tailscale.com>
10 months ago
Brad Fitzpatrick 3bd382f369 wgengine/magicsock: add DERP homeless debug mode for testing
In DERP homeless mode, a DERP home connection is not sought or
maintained and the local node is not reachable.

Updates #3363
Updates tailscale/corp#396

Change-Id: Ibc30488ac2e3cfe4810733b96c2c9f10a51b8331
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
10 months ago
Jordan Whited e848736927
control/controlknobs,wgengine/magicsock: implement SilentDisco toggle (#10195)
This change exposes SilentDisco as a control knob, and plumbs it down to
magicsock.endpoint. No changes are being made to magicsock.endpoint
disco behavior, yet.

Updates #540

Signed-off-by: Jordan Whited <jordan@tailscale.com>
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
10 months ago
Brad Fitzpatrick 514539b611 wgengine/magicsock: close disco listeners on Conn.Close, fix Linux root TestNewConn
TestNewConn now passes as root on Linux. It wasn't closing the BPF
listeners and their goroutines.

The code is still a mess of two Close overlapping code paths, but that
can be refactored later. For now, make the two close paths more similar.

Updates #9945

Change-Id: I8a3cf5fb04d22ba29094243b8e645de293d9ed85
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
11 months ago
Brad Fitzpatrick a6270826a3 wgengine/magicsock: fix data race regression in disco ping callbacks
Regression from c15997511d. The callback could be run multiple times
from different endpoints.

Fixes #9801

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
11 months ago
Val 249edaa349 wgengine/magicsock: add probed MTU metrics
Record the number of MTU probes sent, the total bytes sent, the number of times
we got a successful return from an MTU probe of a particular size, and the max
MTU recorded.

Updates #311

Signed-off-by: Val <valerie@tailscale.com>
12 months ago
Val 893bdd729c disco,net/tstun,wgengine/magicsock: probe peer MTU
Automatically probe the path MTU to a peer when peer MTU is enabled, but do not
use the MTU information for anything yet.

Updates #311

Signed-off-by: Val <valerie@tailscale.com>
12 months ago
Brad Fitzpatrick 6f36f8842c cmd/tailscale, magicsock: add debug command to flip DERP homes
For testing netmap patchification server-side.

Updates #1909

Change-Id: Ib1d784bd97b8d4a31e48374b4567404aae5280cc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
12 months ago
James Tucker 80206b5323 wgengine/magicsock: add nodeid to panic condition on public key reuse
If the condition arises, it should be easy to track down.

Updates #9547
Signed-off-by: James Tucker <james@tailscale.com>
12 months ago
Val 95635857dc wgengine/magicsock: replace CanPMTUD() with ShouldPMTUD()
Replace CanPMTUD() with ShouldPMTUD() to check if peer path MTU discovery should
be enabled, in preparation for adding support for enabling/disabling peer MTU
dynamically.

Updated #311

Signed-off-by: Val <valerie@tailscale.com>
1 year ago
Val a5ae21a832 wgengine/magicsock: improve don't fragment bit set/get support
Add an enable/disable argument to setDontFragment() in preparation for dynamic
enable/disable of peer path MTU discovery. Add getDontFragment() to get the
status of the don't fragment bit from a socket.

Updates #311

Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: Val <valerie@tailscale.com>
1 year ago
Brad Fitzpatrick 926c990a09 types/netmap: start phasing out Addresses, add GetAddresses method
NetworkMap.Addresses is redundant with the SelfNode.Addresses. This
works towards a TODO to delete NetworkMap.Addresses and replace it
with a method.

This is similar to #9389.

Updates #cleanup

Change-Id: Id000509ca5d16bb636401763d41bdb5f38513ba0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick 727b1432a8 wgengine: remove SetNetInfoCallback method from Engine
LocalBackend can talk to magicsock on its own to do this without
the "Engine" being involved.

(Continuing a little side quest of cleaning up the Engine
interface...)

Updates #cleanup

Change-Id: I8654acdca2b883b1bd557fdc0cfb90cd3a418a62
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick 3af051ea27 control/controlclient, types/netmap: start plumbing delta netmap updates
Currently only the top four most popular changes: endpoints, DERP
home, online, and LastSeen.

Updates #1909

Change-Id: I03152da176b2b95232b56acabfb55dcdfaa16b79
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick ff6fadddb6 wgengine/magicsock: stop retaining *netmap.NetworkMap
We're trying to start using that monster type less and eventually get
rid of it.

Updates #1909

Change-Id: I8e1e725bce5324fb820a9be6c7952767863e6542
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick 42072683d6 control/controlknobs: move ForceBackgroundSTUN to controlknobs.Knobs
This is both more efficient (because the knobs' bool is only updated
whenever Node is changed, rarely) and also gets us one step closer to
removing a case of storing a netmap.NetworkMap in
magicsock. (eventually we want to phase out much of the use of that
type internally)

Updates #1909

Change-Id: I37e81789f94133175064fdc09984e4f3a431f1a1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick 4e91cf20a8 control/controlknobs, all: add plumbed Knobs type, not global variables
Previously two tsnet nodes in the same process couldn't have disjoint
sets of controlknob settings from control as both would overwrite each
other's global variables.

This plumbs a new controlknobs.Knobs type around everywhere and hangs
the knobs sent by control on that instead.

Updates #9351

Change-Id: I75338646d36813ed971b4ffad6f9a8b41ec91560
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago