Commit Graph

729 Commits (23880eb5b05368d30023f91c314c9cc2e19f4a90)

Author SHA1 Message Date
Anton Tolchanov b4f46c31bb wgengine/magicsock: export packet drop metric for outbound errors
This required sharing the dropped packet metric between two packages
(tstun and magicsock), so I've moved its definition to util/usermetric.

Updates tailscale/corp#22075

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
3 weeks ago
Anton Tolchanov 532b26145a wgengine/magicsock: exclude disco from throughput metrics
The user-facing metrics are intended to track data transmitted at
the overlay network level.

Updates tailscale/corp#22075

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
3 weeks ago
Tim Walters 856ea2376b wgengine/magicsock: log home DERP changes with latency
This adds additional logging on DERP home changes to allow
better troubleshooting.

Updates tailscale/corp#18095

Signed-off-by: Tim Walters <tim@tailscale.com>
3 weeks ago
Anton Tolchanov 11e96760ff wgengine/magicsock: fix stats packet counter on derp egress
Updates tailscale/corp#22075

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
3 weeks ago
Brad Fitzpatrick 6a885dbc36 wgengine/magicsock: fix CI-only test warning of missing health tracker
While looking at deflaking TestTwoDevicePing/ping_1.0.0.2_via_SendPacket,
there were a bunch of distracting:

    WARNING: (non-fatal) nil health.Tracker (being strict in CI): ...

This pacifies those so it's easier to work on actually deflaking the test.

Updates #11762
Updates #11874

Change-Id: I08dcb44511d4996b68d5f1ce5a2619b555a2a773
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Kristoffer Dalby e0d711c478 {net/connstats,wgengine/magicsock}: fix packet counting in connstats
connstats currently increments the packet counter whenever it is called
to store a length of data, however when udp batch sending was introduced
we pass the length for a series of packages, and it is only incremented
ones, making it count wrongly if we are on a platform supporting udp
batches.

Updates tailscale/corp#22075

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
1 month ago
Kristoffer Dalby 40c991f6b8 wgengine: instrument with usermetrics
Updates tailscale/corp#22075

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
1 month ago
Brad Fitzpatrick 6f694da912 wgengine/magicsock: avoid log spam from ReceiveFunc on shutdown
The new logging in 2dd71e64ac is spammy at shutdown:

    Receive func ReceiveIPv6 exiting with error: *net.OpError, read udp [::]:38869: raw-read udp6 [::]:38869: use of closed network connection
    Receive func ReceiveIPv4 exiting with error: *net.OpError, read udp 0.0.0.0:36123: raw-read udp4 0.0.0.0:36123: use of closed network connection

Skip it if we're in the process of shutting down.

Updates #10976

Change-Id: I4f6d1c68465557eb9ffe335d43d740e499ba9786
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
James Tucker 9eb59c72c1 wgengine/magicsock: fix check for EPERM on macOS
Like Linux, macOS will reply to sendto(2) with EPERM if the firewall is
currently blocking writes, though this behavior is like Linux
undocumented. This is often caused by a faulting network extension or
content filter from EDR software.

Updates #11710
Updates #12891
Updates #13511

Signed-off-by: James Tucker <james@tailscale.com>
2 months ago
Adrian Dewhurst 2fdbcbdf86 wgengine/magicsock: only used cached results for GetLastNetcheckReport
When querying for an exit node suggestion, occasionally it triggers a
new report concurrently with an existing report in progress. Generally,
there should always be a recent report or one in progress, so it is
redundant to start one there, and it causes concurrency issues.

Fixes #12643

Change-Id: I66ab9003972f673e5d4416f40eccd7c6676272a5
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
2 months ago
Kristoffer Dalby 0e0e53d3b3 util/usermetrics: make usermetrics non-global
this commit changes usermetrics to be non-global, this is a building
block for correct metrics if a go process runs multiple tsnets or
in tests.

Updates #13420
Updates tailscale/corp#22075

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2 months ago
Jordan Whited 951884b077
net/netcheck,wgengine/magicsock: plumb OnlyTCP443 controlknob through netcheck (#13491)
Updates tailscale/corp#17879

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Jordan Whited 5f4a4c6744
wgengine/magicsock: fix sendUDPStd docs (#13490)
Updates #cleanup

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Jordan Whited 4084c6186d
wgengine/magicsock: add side-effect-free function for netcheck UDP sends (#13487)
Updates #13484
Updates tailscale/corp#17879

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Andrew Dunham 40833a7524 wgengine/magicsock: disable raw disco by default; add envknob to enable
Updates #13140

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ica85b2ac8ac7eab4ec5413b212f004aecc453279
2 months ago
Jordan Whited afec2d41b4
wgengine/magicsock: remove redundant deadline from netcheck report call (#13395)
netcheck.Client.GetReport() applies its own deadlines. This 2s deadline
was causing GetReport() to never fall back to HTTPS/ICMP measurements
as it was shorter than netcheck.stunProbeTimeout, leaving no time
for fallbacks.

Updates #13394
Updates #6187

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Brad Fitzpatrick 3d401c11fa all: use new Go 1.23 slices.Sorted more
Updates #12912

Change-Id: If1294e5bc7b5d3cf0067535ae10db75e8b988d8b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Andrew Dunham 1c972bc7cb wgengine/magicsock: actually use AF_PACKET socket for raw disco
Previously, despite what the commit said, we were using a raw IP socket
that was *not* an AF_PACKET socket, and thus was subject to the host
firewall rules. Switch to using a real AF_PACKET socket to actually get
the functionality we want.

Updates #13140

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If657daeeda9ab8d967e75a4f049c66e2bca54b78
3 months ago
Brad Fitzpatrick 65fe0ba7b5 wgengine/magicsock: fix panic regression from cryptokey routing change
Fixes #13332
Updates tailscale/corp#20732

Change-Id: I30f12746844bf77f5a664bf8e8d8ebf2511a2b27
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
tomholford 16bb541adb wgengine/magicsock: replace deprecated poly1305 (#13184)
Signed-off-by: tomholford <tomholford@users.noreply.github.com>
3 months ago
Jordan Whited ccf091e4a6
wgengine/magicsock: don't upgrade to linuxBatchingConn on Android (#13161)
In a93dc6cdb1 tryUpgradeToBatchingConn()
moved to build tag gated files, but the runtime.GOOS condition excluding
Android was removed unintentionally from batching_conn_linux.go. Add it
back.

Updates tailscale/corp#22348

Signed-off-by: Jordan Whited <jordan@tailscale.com>
3 months ago
Andrew Dunham e107977f75 wgengine/magicsock: disable SIO_UDP_NETRESET on Windows
By default, Windows sets the SIO_UDP_CONNRESET and SIO_UDP_NETRESET
options on created UDP sockets. These behaviours make the UDP socket
ICMP-aware; when the system gets an ICMP message (e.g. an "ICMP Port
Unreachable" message, in the case of SIO_UDP_CONNRESET), it will cause
the underlying UDP socket to throw an error. Confusingly, this can occur
even on reads, if the same UDP socket is used to write a packet that
triggers this response.

The Go runtime disabled the SIO_UDP_CONNRESET behavior in 3114bd6, but
did not change SIO_UDP_NETRESET–probably because that socket option
isn't documented particularly well.

Various other networking code seem to disable this behaviour, such as
the Godot game engine (godotengine/godot#22332) and the Eclipse TCF
agent (link below). Others appear to work around this by ignoring the
error returned (anacrolix/dht#16, among others).

For now, until it's clear whether this ends up in the upstream Go
implementation or not, let's also disable the SIO_UDP_NETRESET in a
similar manner to SIO_UDP_CONNRESET.

Eclipse TCF agent: https://gitlab.eclipse.org/eclipse/tcf/tcf.agent/-/blob/master/agent/tcf/framework/mdep.c

Updates #10976
Updates golang/go#68614

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I70a2f19855f8dec1bfb82e63f6d14fc4a22ed5c3
3 months ago
Brad Fitzpatrick 2dd71e64ac wgengine/magicsock: log when a ReceiveFunc fails
Updates #10976

Change-Id: I86d30151a25c7d42ed36e273fb207873f4acfdb4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Brad Fitzpatrick a61825c7b8 cmd/tta, vnet: add host firewall, env var support, more tests
In particular, tests showing that #3824 works. But that test doesn't
actually work yet; it only gets a DERP connection. (why?)

Updates #13038

Change-Id: Ie1fd1b6a38d4e90fae7e72a0b9a142a95f0b2e8f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Jordan Whited a93dc6cdb1
wgengine/magicsock: refactor batchingUDPConn to batchingConn interface (#13042)
This commit adds a batchingConn interface, and renames batchingUDPConn
to linuxBatchingConn. tryUpgradeToBatchingConn() may return a platform-
specific implementation of batchingConn. So far only a Linux
implementation of this interface exists, but this refactor is being
done in anticipation of a Windows implementation.

Updates tailscale/corp#21874

Signed-off-by: Jordan Whited <jordan@tailscale.com>
4 months ago
Andrew Dunham 9939374c48 wgengine/magicsock: use cloud metadata to get public IPs
Updates #12774

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1661b6a2da7966ab667b075894837afd96f4742f
4 months ago
Irbe Krumina 57856fc0d5
ipn,wgengine/magicsock: allow setting static node endpoints via tailscaled configfile (#12882)
wgengine/magicsock,ipn: allow setting static node endpoints via tailscaled config file.

Adds a new StaticEndpoints field to tailscaled config
that can be used to statically configure the endpoints
that the node advertizes. This field will replace
TS_DEBUG_PRETENDPOINTS env var that can be used to achieve the same.

Additionally adds some functionality that ensures that endpoints
are updated when configfile is reloaded.

Also, refactor configuring/reconfiguring components to use the
same functionality when configfile is parsed the first time or
subsequent times (after reload). Previously a configfile reload
did not result in resetting of prefs. Now it does- but does not yet
tell the relevant components to consume the new prefs. This is to
be done in a follow-up.

Updates tailscale/tailscale#12578


Signed-off-by: Irbe Krumina <irbe@tailscale.com>
4 months ago
Brad Fitzpatrick 808b4139ee wgengine/magicsock: use wireguard-go/conn.PeerAwareEndpoint
If we get an non-disco presumably-wireguard-encrypted UDP packet from
an IP:port we don't recognize, rather than drop the packet, give it to
WireGuard anyway and let WireGuard try to figure out who it's from and
tell us.

This uses the new hook added in https://github.com/tailscale/wireguard-go/pull/27

Updates tailscale/corp#20732

Change-Id: I5c61a40143810592f9efac6c12808a87f924ecf2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 months ago
Lee Briggs b546a6e758 wgengine/magicsock: allow a CSV list for pretendpoint
Load Balancers often have more than one ingress IP, so allowing us to
add multiple means we can offer multiple options.

Updates #12578

Change-Id: I4aa49a698d457627d2f7011796d665c67d4c7952
Signed-off-by: Lee Briggs <lee@leebriggs.co.uk>
4 months ago
Brad Fitzpatrick 42dac7c5c2 wgengine/magicsock: add debug envknob for injecting an endpoint
For testing. Lee wants to play with 'AWS Global Accelerator Custom
Routing with Amazon Elastic Kubernetes Service'. If this works well
enough, we can promote it.

Updates #12578

Change-Id: I5018347ed46c15c9709910717d27305d0aedf8f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
5 months ago
Brad Fitzpatrick d2fef01206 control/controlknobs,tailcfg,wgengine/magicsock: remove DRPO shutoff switch
The DERP Return Path Optimization (DRPO) is over four years old (and
on by default for over two) and we haven't had problems, so time to
remove the emergency shutoff code (controlknob) which we've never
used. The controlknobs are only meant for new features, to mitigate
risk. But we don't want to keep them forever, as they kinda pollute
the code.

Updates #150

Change-Id: If021bc8fd1b51006d8bddd1ffab639bb1abb0ad1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
5 months ago
Brad Fitzpatrick 9df107f4f0 wgengine/magicsock: use derp-region-as-magic-AddrPort hack in fewer places
And fix up a bogus comment and flesh out some other comments.

Updates #cleanup

Change-Id: Ia60a1c04b0f5e44e8d9587914af819df8e8f442a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
5 months ago
Andrew Dunham 8487fd2ec2 wgengine/magicsock: add more DERP home clientmetrics
Updates tailscale/corp#18095

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I423adca2de0730092394bb5fd5796cd35557d352
5 months ago
Andrew Dunham 8161024176 wgengine/magicsock: always set home DERP if no control conn
The logic we added in #11378 would prevent selecting a home DERP if we
have no control connection.

Updates tailscale/corp#18095

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I44bb6ac4393989444e4961b8cfa27dc149a33c6e
5 months ago
Maisem Ali 36e8e8cd64 wgengine/magicsock: use math/rands/v2
Updates #11058

Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: Maisem Ali <maisem@tailscale.com>
6 months ago
Maisem Ali 4a8cb1d9f3 all: use math/rand/v2 more
Updates #11058

Signed-off-by: Maisem Ali <maisem@tailscale.com>
6 months ago
James Tucker 9351eec3e1 net/netcheck: remove hairpin probes
Palo Alto reported interpreting hairpin probes as LAND attacks, and the
firewalls may be responding to this by shutting down otherwise in use NAT sessions
prematurely. We don't currently make use of the outcome of the hairpin
probes, and they contribute to other user confusion with e.g. the
AirPort Extreme hairpin session workaround. We decided in response to
remove the whole probe feature as a result.

Updates #188
Updates tailscale/corp#19106
Updates tailscale/corp#19116

Signed-off-by: James Tucker <james@tailscale.com>
6 months ago
Brad Fitzpatrick 964282d34f ipn,wgengine: remove vestigial Prefs.AllowSingleHosts
It was requested by the first customer 4-5 years ago and only used
for a brief moment of time. We later added netmap visibility trimming
which removes the need for this.

It's been hidden by the CLI for quite some time and never documented
anywhere else.

This keeps the CLI flag, though, out of caution. It just returns an
error if it's set to anything but true (its default).

Fixes #12058

Change-Id: I7514ba572e7b82519b04ed603ff9f3bdbaecfda7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
6 months ago
James Tucker 8d1249550a net/netcheck,wgengine/magicsock: add potential workaround for Palo Alto DIPP misbehavior
Palo Alto firewalls have a typically hard NAT, but also have a mode
called Persistent DIPP that is supposed to provide consistent port
mapping suitable for STUN resolution of public ports. Persistent DIPP
works initially on most Palo Alto firewalls, but some models/software
versions have a bug which this works around.

The bug symptom presents as follows:

- STUN sessions resolve a consistent public IP:port to start with
- Much later netchecks report the same IP:Port for a subset of
  sessions, most often the users active DERP, and/or the port related
  to sustained traffic.
- The broader set of DERPs in a full netcheck will now consistently
  observe a new IP:Port.
- After this point of observation, new inbound connections will only
  succeed to the new IP:Port observed, and existing/old sessions will
  only work to the old binding.

In this patch we now advertise the lowest latency global endpoint
discovered as we always have, but in addition any global endpoints that
are observed more than once in a single netcheck report. This should
provide viable endpoints for potential connection establishment across
a NAT with this behavior.

Updates tailscale/corp#19106

Signed-off-by: James Tucker <james@tailscale.com>
6 months ago
Claire Wang e070af7414
ipnlocal, magicsock: add more description to storing last suggested exit (#11998)
node related functions
Updates tailscale/corp#19681

Signed-off-by: Claire Wang <claire@tailscale.com>
7 months ago
Claire Wang 35872e86d2
ipnlocal, magicsock: store last suggested exit node id in local backend (#11959)
Updates tailscale/corp#19681

Signed-off-by: Claire Wang <claire@tailscale.com>
7 months ago
Brad Fitzpatrick b9adbe2002 net/{interfaces,netmon}, all: merge net/interfaces package into net/netmon
In prep for most of the package funcs in net/interfaces to become
methods in a long-lived netmon.Monitor that can cache things.  (Many
of the funcs are very heavy to call regularly, whereas the long-lived
netmon.Monitor can subscribe to things from the OS and remember
answers to questions it's asked regularly later)

Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299

Change-Id: Ie4e8dedb70136af2d611b990b865a822cd1797e5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick 7a62dddeac net/netcheck, wgengine/magicsock: make netmon.Monitor required
This has been a TODO for ages. Time to do it.

The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached.

Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299

Change-Id: I60fc6508cd2d8d079260bda371fc08b6318bcaf1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick 7f587d0321 health, wgengine/magicsock: remove last of health package globals
Fixes #11874
Updates #4136

Change-Id: Ib70e6831d4c19c32509fe3d7eee4aa0e9f233564
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick 6d69fc137f ipn/{ipnlocal,localapi},wgengine{,/magicsock}: plumb health.Tracker
Down to 25 health.Global users. After this remains controlclient &
net/dns & wgengine/router.

Updates #11874
Updates #4136

Change-Id: I6dd1856e3d9bf523bdd44b60fb3b8f7501d5dc0d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick 723c775dbb tsd, ipnlocal, etc: add tsd.System.HealthTracker, start some plumbing
This adds a health.Tracker to tsd.System, accessible via
a new tsd.System.HealthTracker method.

In the future, that new method will return a tsd.System-specific
HealthTracker, so multiple tsnet.Servers in the same process are
isolated. For now, though, it just always returns the temporary
health.Global value. That permits incremental plumbing over a number
of changes. When the second to last health.Global reference is gone,
then the tsd.System.HealthTracker implementation can return a private
Tracker.

The primary plumbing this does is adding it to LocalBackend and its
dozen and change health calls. A few misc other callers are also
plumbed. Subsequent changes will flesh out other parts of the tree
(magicsock, controlclient, etc).

Updates #11874
Updates #4136

Change-Id: Id51e73cfc8a39110425b6dc19d18b3975eac75ce
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick ebc552d2e0 health: add Tracker type, in prep for removing global variables
This moves most of the health package global variables to a new
`health.Tracker` type.

But then rather than plumbing the Tracker in tsd.System everywhere,
this only goes halfway and makes one new global Tracker
(`health.Global`) that all the existing callers now use.

A future change will eliminate that global.

Updates #11874
Updates #4136

Change-Id: I6ee27e0b2e35f68cb38fecdb3b2dc4c3f2e09d68
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick 03d5d1f0f9 wgengine/magicsock: disable portmapper in tunchan-faked tests
Most of the magicsock tests fake the network, simulating packets going
out and coming in. There's no reason to actually hit your router to do
UPnP/NAT-PMP/PCP during in tests. But while debugging thousands of
iterations of tests to deflake some things, I saw it slamming my
router. This stops that.

Updates #11762

Change-Id: I59b9f48f8f5aff1fa16b4935753d786342e87744
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick 7c1d6e35a5 all: use Go 1.22 range-over-int
Updates #11058

Change-Id: I35e7ef9b90e83cac04ca93fd964ad00ed5b48430
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick 0fba9e7570 cmd/tailscale/cli: prevent concurrent Start calls in 'up'
Seems to deflake tstest/integration tests. I can't reproduce it
anymore on one of my VMs that was consistently flaking after a dozen
runs before. Now I can run hundreds of times.

Updates #11649
Fixes #7036

Change-Id: I2f7d4ae97500d507bdd78af9e92cd1242e8e44b8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago