Commit Graph

558 Commits (e1e22785b4cdef300ca89206f307f867ba262c6e)

Author SHA1 Message Date
Kristoffer Dalby e0d711c478 {net/connstats,wgengine/magicsock}: fix packet counting in connstats
connstats currently increments the packet counter whenever it is called
to store a length of data, however when udp batch sending was introduced
we pass the length for a series of packages, and it is only incremented
ones, making it count wrongly if we are on a platform supporting udp
batches.

Updates tailscale/corp#22075

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
3 weeks ago
Kristoffer Dalby 40c991f6b8 wgengine: instrument with usermetrics
Updates tailscale/corp#22075

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
3 weeks ago
Brad Fitzpatrick 6f694da912 wgengine/magicsock: avoid log spam from ReceiveFunc on shutdown
The new logging in 2dd71e64ac is spammy at shutdown:

    Receive func ReceiveIPv6 exiting with error: *net.OpError, read udp [::]:38869: raw-read udp6 [::]:38869: use of closed network connection
    Receive func ReceiveIPv4 exiting with error: *net.OpError, read udp 0.0.0.0:36123: raw-read udp4 0.0.0.0:36123: use of closed network connection

Skip it if we're in the process of shutting down.

Updates #10976

Change-Id: I4f6d1c68465557eb9ffe335d43d740e499ba9786
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 weeks ago
James Tucker 9eb59c72c1 wgengine/magicsock: fix check for EPERM on macOS
Like Linux, macOS will reply to sendto(2) with EPERM if the firewall is
currently blocking writes, though this behavior is like Linux
undocumented. This is often caused by a faulting network extension or
content filter from EDR software.

Updates #11710
Updates #12891
Updates #13511

Signed-off-by: James Tucker <james@tailscale.com>
1 month ago
Adrian Dewhurst 2fdbcbdf86 wgengine/magicsock: only used cached results for GetLastNetcheckReport
When querying for an exit node suggestion, occasionally it triggers a
new report concurrently with an existing report in progress. Generally,
there should always be a recent report or one in progress, so it is
redundant to start one there, and it causes concurrency issues.

Fixes #12643

Change-Id: I66ab9003972f673e5d4416f40eccd7c6676272a5
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
1 month ago
Kristoffer Dalby 0e0e53d3b3 util/usermetrics: make usermetrics non-global
this commit changes usermetrics to be non-global, this is a building
block for correct metrics if a go process runs multiple tsnets or
in tests.

Updates #13420
Updates tailscale/corp#22075

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
1 month ago
Jordan Whited 951884b077
net/netcheck,wgengine/magicsock: plumb OnlyTCP443 controlknob through netcheck (#13491)
Updates tailscale/corp#17879

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Jordan Whited 5f4a4c6744
wgengine/magicsock: fix sendUDPStd docs (#13490)
Updates #cleanup

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Jordan Whited 4084c6186d
wgengine/magicsock: add side-effect-free function for netcheck UDP sends (#13487)
Updates #13484
Updates tailscale/corp#17879

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Andrew Dunham 40833a7524 wgengine/magicsock: disable raw disco by default; add envknob to enable
Updates #13140

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ica85b2ac8ac7eab4ec5413b212f004aecc453279
2 months ago
Jordan Whited afec2d41b4
wgengine/magicsock: remove redundant deadline from netcheck report call (#13395)
netcheck.Client.GetReport() applies its own deadlines. This 2s deadline
was causing GetReport() to never fall back to HTTPS/ICMP measurements
as it was shorter than netcheck.stunProbeTimeout, leaving no time
for fallbacks.

Updates #13394
Updates #6187

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Brad Fitzpatrick 65fe0ba7b5 wgengine/magicsock: fix panic regression from cryptokey routing change
Fixes #13332
Updates tailscale/corp#20732

Change-Id: I30f12746844bf77f5a664bf8e8d8ebf2511a2b27
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Andrew Dunham e107977f75 wgengine/magicsock: disable SIO_UDP_NETRESET on Windows
By default, Windows sets the SIO_UDP_CONNRESET and SIO_UDP_NETRESET
options on created UDP sockets. These behaviours make the UDP socket
ICMP-aware; when the system gets an ICMP message (e.g. an "ICMP Port
Unreachable" message, in the case of SIO_UDP_CONNRESET), it will cause
the underlying UDP socket to throw an error. Confusingly, this can occur
even on reads, if the same UDP socket is used to write a packet that
triggers this response.

The Go runtime disabled the SIO_UDP_CONNRESET behavior in 3114bd6, but
did not change SIO_UDP_NETRESET–probably because that socket option
isn't documented particularly well.

Various other networking code seem to disable this behaviour, such as
the Godot game engine (godotengine/godot#22332) and the Eclipse TCF
agent (link below). Others appear to work around this by ignoring the
error returned (anacrolix/dht#16, among others).

For now, until it's clear whether this ends up in the upstream Go
implementation or not, let's also disable the SIO_UDP_NETRESET in a
similar manner to SIO_UDP_CONNRESET.

Eclipse TCF agent: https://gitlab.eclipse.org/eclipse/tcf/tcf.agent/-/blob/master/agent/tcf/framework/mdep.c

Updates #10976
Updates golang/go#68614

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I70a2f19855f8dec1bfb82e63f6d14fc4a22ed5c3
3 months ago
Brad Fitzpatrick 2dd71e64ac wgengine/magicsock: log when a ReceiveFunc fails
Updates #10976

Change-Id: I86d30151a25c7d42ed36e273fb207873f4acfdb4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Jordan Whited a93dc6cdb1
wgengine/magicsock: refactor batchingUDPConn to batchingConn interface (#13042)
This commit adds a batchingConn interface, and renames batchingUDPConn
to linuxBatchingConn. tryUpgradeToBatchingConn() may return a platform-
specific implementation of batchingConn. So far only a Linux
implementation of this interface exists, but this refactor is being
done in anticipation of a Windows implementation.

Updates tailscale/corp#21874

Signed-off-by: Jordan Whited <jordan@tailscale.com>
3 months ago
Andrew Dunham 9939374c48 wgengine/magicsock: use cloud metadata to get public IPs
Updates #12774

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1661b6a2da7966ab667b075894837afd96f4742f
3 months ago
Irbe Krumina 57856fc0d5
ipn,wgengine/magicsock: allow setting static node endpoints via tailscaled configfile (#12882)
wgengine/magicsock,ipn: allow setting static node endpoints via tailscaled config file.

Adds a new StaticEndpoints field to tailscaled config
that can be used to statically configure the endpoints
that the node advertizes. This field will replace
TS_DEBUG_PRETENDPOINTS env var that can be used to achieve the same.

Additionally adds some functionality that ensures that endpoints
are updated when configfile is reloaded.

Also, refactor configuring/reconfiguring components to use the
same functionality when configfile is parsed the first time or
subsequent times (after reload). Previously a configfile reload
did not result in resetting of prefs. Now it does- but does not yet
tell the relevant components to consume the new prefs. This is to
be done in a follow-up.

Updates tailscale/tailscale#12578


Signed-off-by: Irbe Krumina <irbe@tailscale.com>
3 months ago
Brad Fitzpatrick 808b4139ee wgengine/magicsock: use wireguard-go/conn.PeerAwareEndpoint
If we get an non-disco presumably-wireguard-encrypted UDP packet from
an IP:port we don't recognize, rather than drop the packet, give it to
WireGuard anyway and let WireGuard try to figure out who it's from and
tell us.

This uses the new hook added in https://github.com/tailscale/wireguard-go/pull/27

Updates tailscale/corp#20732

Change-Id: I5c61a40143810592f9efac6c12808a87f924ecf2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 months ago
Lee Briggs b546a6e758 wgengine/magicsock: allow a CSV list for pretendpoint
Load Balancers often have more than one ingress IP, so allowing us to
add multiple means we can offer multiple options.

Updates #12578

Change-Id: I4aa49a698d457627d2f7011796d665c67d4c7952
Signed-off-by: Lee Briggs <lee@leebriggs.co.uk>
4 months ago
Brad Fitzpatrick 42dac7c5c2 wgengine/magicsock: add debug envknob for injecting an endpoint
For testing. Lee wants to play with 'AWS Global Accelerator Custom
Routing with Amazon Elastic Kubernetes Service'. If this works well
enough, we can promote it.

Updates #12578

Change-Id: I5018347ed46c15c9709910717d27305d0aedf8f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 months ago
Brad Fitzpatrick 9df107f4f0 wgengine/magicsock: use derp-region-as-magic-AddrPort hack in fewer places
And fix up a bogus comment and flesh out some other comments.

Updates #cleanup

Change-Id: Ia60a1c04b0f5e44e8d9587914af819df8e8f442a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 months ago
Andrew Dunham 8487fd2ec2 wgengine/magicsock: add more DERP home clientmetrics
Updates tailscale/corp#18095

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I423adca2de0730092394bb5fd5796cd35557d352
4 months ago
James Tucker 9351eec3e1 net/netcheck: remove hairpin probes
Palo Alto reported interpreting hairpin probes as LAND attacks, and the
firewalls may be responding to this by shutting down otherwise in use NAT sessions
prematurely. We don't currently make use of the outcome of the hairpin
probes, and they contribute to other user confusion with e.g. the
AirPort Extreme hairpin session workaround. We decided in response to
remove the whole probe feature as a result.

Updates #188
Updates tailscale/corp#19106
Updates tailscale/corp#19116

Signed-off-by: James Tucker <james@tailscale.com>
5 months ago
James Tucker 8d1249550a net/netcheck,wgengine/magicsock: add potential workaround for Palo Alto DIPP misbehavior
Palo Alto firewalls have a typically hard NAT, but also have a mode
called Persistent DIPP that is supposed to provide consistent port
mapping suitable for STUN resolution of public ports. Persistent DIPP
works initially on most Palo Alto firewalls, but some models/software
versions have a bug which this works around.

The bug symptom presents as follows:

- STUN sessions resolve a consistent public IP:port to start with
- Much later netchecks report the same IP:Port for a subset of
  sessions, most often the users active DERP, and/or the port related
  to sustained traffic.
- The broader set of DERPs in a full netcheck will now consistently
  observe a new IP:Port.
- After this point of observation, new inbound connections will only
  succeed to the new IP:Port observed, and existing/old sessions will
  only work to the old binding.

In this patch we now advertise the lowest latency global endpoint
discovered as we always have, but in addition any global endpoints that
are observed more than once in a single netcheck report. This should
provide viable endpoints for potential connection establishment across
a NAT with this behavior.

Updates tailscale/corp#19106

Signed-off-by: James Tucker <james@tailscale.com>
6 months ago
Claire Wang e070af7414
ipnlocal, magicsock: add more description to storing last suggested exit (#11998)
node related functions
Updates tailscale/corp#19681

Signed-off-by: Claire Wang <claire@tailscale.com>
6 months ago
Claire Wang 35872e86d2
ipnlocal, magicsock: store last suggested exit node id in local backend (#11959)
Updates tailscale/corp#19681

Signed-off-by: Claire Wang <claire@tailscale.com>
6 months ago
Brad Fitzpatrick b9adbe2002 net/{interfaces,netmon}, all: merge net/interfaces package into net/netmon
In prep for most of the package funcs in net/interfaces to become
methods in a long-lived netmon.Monitor that can cache things.  (Many
of the funcs are very heavy to call regularly, whereas the long-lived
netmon.Monitor can subscribe to things from the OS and remember
answers to questions it's asked regularly later)

Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299

Change-Id: Ie4e8dedb70136af2d611b990b865a822cd1797e5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
6 months ago
Brad Fitzpatrick 7a62dddeac net/netcheck, wgengine/magicsock: make netmon.Monitor required
This has been a TODO for ages. Time to do it.

The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached.

Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299

Change-Id: I60fc6508cd2d8d079260bda371fc08b6318bcaf1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
6 months ago
Brad Fitzpatrick 7f587d0321 health, wgengine/magicsock: remove last of health package globals
Fixes #11874
Updates #4136

Change-Id: Ib70e6831d4c19c32509fe3d7eee4aa0e9f233564
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
6 months ago
Brad Fitzpatrick 6d69fc137f ipn/{ipnlocal,localapi},wgengine{,/magicsock}: plumb health.Tracker
Down to 25 health.Global users. After this remains controlclient &
net/dns & wgengine/router.

Updates #11874
Updates #4136

Change-Id: I6dd1856e3d9bf523bdd44b60fb3b8f7501d5dc0d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
6 months ago
Brad Fitzpatrick ebc552d2e0 health: add Tracker type, in prep for removing global variables
This moves most of the health package global variables to a new
`health.Tracker` type.

But then rather than plumbing the Tracker in tsd.System everywhere,
this only goes halfway and makes one new global Tracker
(`health.Global`) that all the existing callers now use.

A future change will eliminate that global.

Updates #11874
Updates #4136

Change-Id: I6ee27e0b2e35f68cb38fecdb3b2dc4c3f2e09d68
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
6 months ago
Brad Fitzpatrick 03d5d1f0f9 wgengine/magicsock: disable portmapper in tunchan-faked tests
Most of the magicsock tests fake the network, simulating packets going
out and coming in. There's no reason to actually hit your router to do
UPnP/NAT-PMP/PCP during in tests. But while debugging thousands of
iterations of tests to deflake some things, I saw it slamming my
router. This stops that.

Updates #11762

Change-Id: I59b9f48f8f5aff1fa16b4935753d786342e87744
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Claire Wang 9171b217ba
cmd/tailscale, ipn/ipnlocal: add suggest exit node CLI option (#11407)
Updates tailscale/corp#17516

Signed-off-by: Claire Wang <claire@tailscale.com>
7 months ago
Charlotte Brandhorst-Satzkorn 449f46c207
wgengine/magicsock: rebind/restun if a syscall.EPERM error is returned (#11711)
We have seen in macOS client logs that the "operation not permitted", a
syscall.EPERM error, is being returned when traffic is attempted to be
sent. This may be caused by security software on the client.

This change will perform a rebind and restun if we receive a
syscall.EPERM error on clients running darwin. Rebinds will only be
called if we haven't performed one specifically for an EPERM error in
the past 5 seconds.

Updates #11710

Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
7 months ago
James Tucker a2eb1c22b0 wgengine/magicsock: allow disco communication without known endpoints
Just because we don't have known endpoints for a peer does not mean that
the peer should become unreachable. If we know the peers key, it should
be able to call us, then we can talk back via whatever path it called us
on. First step - don't drop the packet in this context.

Updates tailscale/corp#19106

Signed-off-by: James Tucker <james@tailscale.com>
7 months ago
Brad Fitzpatrick a36cfb4d3d tailcfg, ipn/ipnlocal, wgengine/magicsock: add only-tcp-443 node attr
Updates tailscale/corp#17879

Change-Id: I0dc305d147b76c409cf729b599a94fa723aef0e0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick 5d1c72f76b wgengine/magicsock: don't use endpoint debug ringbuffer on mobile.
Save some memory.

Updates tailscale/corp#18514

Change-Id: Ibcaf3c6d8e5cc275c81f04141d0f176e2249509b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
8 months ago
Andrew Dunham f072d017bd wgengine/magicsock: don't change DERP home when not connected to control
This pretty much always results in an outage because peers won't
discover our new home region and thus won't be able to establish
connectivity.

Updates tailscale/corp#18095

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ic0d09133f198b528dd40c6383b16d7663d9d37a7
8 months ago
Brad Fitzpatrick 69f4b4595a wgengine{,/wgint}: add wgint.Peer wrapper type, add to wgengine.Engine
This adds a method to wgengine.Engine and plumbed down into magicsock
to add a way to get a type-safe Tailscale-safe wrapper around a
wireguard-go device.Peer that only exposes methods that are safe for
Tailscale to use internally.

It also removes HandshakeAttempts from PeerStatusLite that was just
added as it wasn't needed yet and is now accessible ala cart as needed
from the Peer type accessor.

None of this is used yet.

Updates #7617

Change-Id: I07be0c4e6679883e6eeddf8dbed7394c9e79c5f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
8 months ago
Brad Fitzpatrick e1bd7488d0 all: remove LenIter, use Go 1.22 range-over-int instead
Updates #11058
Updates golang/go#65685

Change-Id: Ibb216b346e511d486271ab3d84e4546c521e4e22
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
8 months ago
Jordan Whited 8b47322acc
wgengine/magicsock: implement probing of UDP path lifetime (#10844)
This commit implements probing of UDP path lifetime on the tail end of
an active direct connection. Probing configuration has two parts -
Cliffs, which are various timeout cliffs of interest, and
CycleCanStartEvery, which limits how often a probing cycle can start,
per-endpoint. Initially a statically defined default configuration will
be used. The default configuration has cliffs of 10s, 30s, and 60s,
with a CycleCanStartEvery of 24h. Probing results are communicated via
clientmetric counters. Probing is off by default, and can be enabled
via control knob. Probing is purely informational and does not yet
drive any magicsock behaviors.

Updates #540

Signed-off-by: Jordan Whited <jordan@tailscale.com>
9 months ago
Claire Wang 213d696db0
magicsock: mute noisy expected peer mtu related error (#10870) 10 months ago
Andrew Lytvynov 2716250ee8
all: cleanup unused code, part 2 (#10670)
And enable U1000 check in staticcheck.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
11 months ago
Andrew Dunham 727acf96a6 net/netcheck: use DERP frames as a signal for home region liveness
This uses the fact that we've received a frame from a given DERP region
within a certain time as a signal that the region is stil present (and
thus can still be a node's PreferredDERP / home region) even if we don't
get a STUN response from that region during a netcheck.

This should help avoid DERP flaps that occur due to losing STUN probes
while still having a valid and active TCP connection to the DERP server.

RELNOTE=Reduce home DERP flapping when there's still an active connection

Updates #8603

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If7da6312581e1d434d5c0811697319c621e187a0
11 months ago
Naman Sood d46a4eced5
util/linuxfw, wgengine: allow ingress to magicsock UDP port on Linux (#10370)
* util/linuxfw, wgengine: allow ingress to magicsock UDP port on Linux

Updates #9084.

Currently, we have to tell users to manually open UDP ports on Linux when
certain firewalls (like ufw) are enabled. This change automates the process of
adding and updating those firewall rules as magicsock changes what port it
listens on.

Signed-off-by: Naman Sood <mail@nsood.in>
11 months ago
Jordan Whited 1af7f5b549
wgengine/magicsock: fix typo in Conn.handlePingLocked() (#10365)
Updates #cleanup

Signed-off-by: Jordan Whited <jordan@tailscale.com>
12 months ago
Brad Fitzpatrick 3bd382f369 wgengine/magicsock: add DERP homeless debug mode for testing
In DERP homeless mode, a DERP home connection is not sought or
maintained and the local node is not reachable.

Updates #3363
Updates tailscale/corp#396

Change-Id: Ibc30488ac2e3cfe4810733b96c2c9f10a51b8331
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
12 months ago
Jordan Whited e848736927
control/controlknobs,wgengine/magicsock: implement SilentDisco toggle (#10195)
This change exposes SilentDisco as a control knob, and plumbs it down to
magicsock.endpoint. No changes are being made to magicsock.endpoint
disco behavior, yet.

Updates #540

Signed-off-by: Jordan Whited <jordan@tailscale.com>
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
12 months ago
Brad Fitzpatrick 514539b611 wgengine/magicsock: close disco listeners on Conn.Close, fix Linux root TestNewConn
TestNewConn now passes as root on Linux. It wasn't closing the BPF
listeners and their goroutines.

The code is still a mess of two Close overlapping code paths, but that
can be refactored later. For now, make the two close paths more similar.

Updates #9945

Change-Id: I8a3cf5fb04d22ba29094243b8e645de293d9ed85
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick a6270826a3 wgengine/magicsock: fix data race regression in disco ping callbacks
Regression from c15997511d. The callback could be run multiple times
from different endpoints.

Fixes #9801

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago