Commit Graph

505 Commits (67926ede39f0cb740e6fa55ae75bce07cfeb1f1f)

Author SHA1 Message Date
James Tucker 80206b5323 wgengine/magicsock: add nodeid to panic condition on public key reuse
If the condition arises, it should be easy to track down.

Updates #9547
Signed-off-by: James Tucker <james@tailscale.com>
1 year ago
Val 95635857dc wgengine/magicsock: replace CanPMTUD() with ShouldPMTUD()
Replace CanPMTUD() with ShouldPMTUD() to check if peer path MTU discovery should
be enabled, in preparation for adding support for enabling/disabling peer MTU
dynamically.

Updated #311

Signed-off-by: Val <valerie@tailscale.com>
1 year ago
Val a5ae21a832 wgengine/magicsock: improve don't fragment bit set/get support
Add an enable/disable argument to setDontFragment() in preparation for dynamic
enable/disable of peer path MTU discovery. Add getDontFragment() to get the
status of the don't fragment bit from a socket.

Updates #311

Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: Val <valerie@tailscale.com>
1 year ago
Brad Fitzpatrick 926c990a09 types/netmap: start phasing out Addresses, add GetAddresses method
NetworkMap.Addresses is redundant with the SelfNode.Addresses. This
works towards a TODO to delete NetworkMap.Addresses and replace it
with a method.

This is similar to #9389.

Updates #cleanup

Change-Id: Id000509ca5d16bb636401763d41bdb5f38513ba0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick 727b1432a8 wgengine: remove SetNetInfoCallback method from Engine
LocalBackend can talk to magicsock on its own to do this without
the "Engine" being involved.

(Continuing a little side quest of cleaning up the Engine
interface...)

Updates #cleanup

Change-Id: I8654acdca2b883b1bd557fdc0cfb90cd3a418a62
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick 3af051ea27 control/controlclient, types/netmap: start plumbing delta netmap updates
Currently only the top four most popular changes: endpoints, DERP
home, online, and LastSeen.

Updates #1909

Change-Id: I03152da176b2b95232b56acabfb55dcdfaa16b79
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick ff6fadddb6 wgengine/magicsock: stop retaining *netmap.NetworkMap
We're trying to start using that monster type less and eventually get
rid of it.

Updates #1909

Change-Id: I8e1e725bce5324fb820a9be6c7952767863e6542
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick 42072683d6 control/controlknobs: move ForceBackgroundSTUN to controlknobs.Knobs
This is both more efficient (because the knobs' bool is only updated
whenever Node is changed, rarely) and also gets us one step closer to
removing a case of storing a netmap.NetworkMap in
magicsock. (eventually we want to phase out much of the use of that
type internally)

Updates #1909

Change-Id: I37e81789f94133175064fdc09984e4f3a431f1a1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick 4e91cf20a8 control/controlknobs, all: add plumbed Knobs type, not global variables
Previously two tsnet nodes in the same process couldn't have disjoint
sets of controlknob settings from control as both would overwrite each
other's global variables.

This plumbs a new controlknobs.Knobs type around everywhere and hangs
the knobs sent by control on that instead.

Updates #9351

Change-Id: I75338646d36813ed971b4ffad6f9a8b41ec91560
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick d050700a3b wgengine/magicsock: make peerMap also keyed by NodeID
In prep for incremental netmap update plumbing (#1909), make peerMap
also keyed by NodeID, as all the netmap node mutations passed around
later will be keyed by NodeID.

In the process, also:

* add envknob.InDevMode, as a signal that we can panic more aggressively
  in unexpected cases.
* pull two moderately large blocks of code in Conn.SetNetworkMap out
  into their own methods
* convert a few more sets from maps to set.Set

Updates #1909

Change-Id: I7acdd64452ba58e9d554140ee7a8760f9043f961
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick dc7aa98b76 all: use set.Set consistently instead of map[T]struct{}
I didn't clean up the more idiomatic map[T]bool with true values, at
least yet.  I just converted the relatively awkward struct{}-valued
maps.

Updates #cleanup

Change-Id: I758abebd2bb1f64bc7a9d0f25c32298f4679c14f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Craig Rodrigues 8683ce78c2 client/web, clientupdate, util/linuxfw, wgengine/magicsock: Use %v verb for errors
Replace %w verb with %v verb when logging errors.
Use %w only for wrapping errors with fmt.Errorf()

Fixes: #9213

Signed-off-by: Craig Rodrigues <rodrigc@crodrigues.org>
1 year ago
Brad Fitzpatrick ea4425d8a9 ipn/ipnlocal, wgengine/magicsock: move UpdateStatus stuff around
Upcoming work on incremental netmap change handling will require some
replumbing of which subsystems get notified about what. Done naively,
it could break "tailscale status --json" visibility later. To make sure
I understood the flow of all the updates I was rereading the status code
and realized parts of ipnstate.Status were being populated by the wrong
subsystems.

The engine (wireguard) and magicsock (data plane, NAT traveral) should
only populate the stuff that they uniquely know. The WireGuard bits
were fine but magicsock was populating stuff stuff that LocalBackend
could've better handled, so move it there.

Updates #1909

Change-Id: I6d1b95d19a2d1b70fbb3c875fac8ea1e169e8cb0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
James Tucker e1c7e9b736 wgengine/magicsock: improve endpoint selection for WireGuard peers with rx time
If we don't have the ICMP hint available, such as on Android, we can use
the signal of rx traffic to bias toward a particular endpoint.

We don't want to stick to a particular endpoint for a very long time
without any signals, so the sticky time is reduced to 1 second, which is
large enough to avoid excessive packet reordering in the common case,
but should be small enough that either rx provides a strong signal, or
we rotate in a user-interactive schedule to another endpoint, improving
the feel of failover to other endpoints.

Updates #8999

Co-authored-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>

Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
1 year ago
Brad Fitzpatrick 84b94b3146 types/netmap, all: make NetworkMap.SelfNode a tailcfg.NodeView
Updates #1909

Change-Id: I8c470cbc147129a652c1d58eac9b790691b87606
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Val c15997511d wgengine/magicsock: only accept pong sent by CLI ping
When sending a ping from the CLI, only accept a pong that is in reply
to the specific CLI ping we sent.

Updates #311

Signed-off-by: Val <valerie@tailscale.com>
1 year ago
Brad Fitzpatrick 58a4fd43d8 types/netmap, all: use read-only tailcfg.NodeView in NetworkMap
Updates #8948

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick af2e4909b6 all: remove some Debug fields, NetworkMap.Debug, Reconfig Debug arg
Updates #8923

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick 25663b1307 tailcfg: remove most Debug fields, move bulk to nodeAttrs [capver 70]
Now a nodeAttr: ForceBackgroundSTUN, DERPRoute, TrimWGConfig,
DisableSubnetsIfPAC, DisableUPnP.

Kept support for, but also now a NodeAttr: RandomizeClientPort.

Removed: SetForceBackgroundSTUN, SetRandomizeClientPort (both never
used, sadly... never got around to them. But nodeAttrs are better
anyway), EnableSilentDisco (will be a nodeAttr later when that effort
resumes).

Updates #8923

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
KevinLiang10 7ed3681cbe tailcfg: Add FirewallMode to NetInfo to record wether host using iptables or nftables
To record wether user is using iptables or nftables after we add support to nftables on linux, we
are adding a field FirewallMode to NetInfo in HostInfo to reflect what firewall mode the host is
running, and form metrics. The information is gained from a global constant in hostinfo.go. We
set it when selection heuristic made the decision, and magicsock reports this to control.

Updates: tailscale/corp#13943
Signed-off-by: KevinLiang10 <kevinliang@tailscale.com>
1 year ago
Andrew Dunham 95d776bd8c wgengine/magicsock: only cache N most recent endpoints per-Addr
If a node is flapping or otherwise generating lots of STUN endpoints, we
can end up caching a ton of useless values and sending them to peers.
Instead, let's apply a fixed per-Addr limit of endpoints that we cache,
so that we're only sending peers up to the N most recent.

Updates tailscale/corp#13890

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8079a05b44220c46da55016c0e5fc96dd2135ef8
1 year ago
James Tucker de8e55fda6 net/netcheck,wgengine/magicsock: reduce coupling between netcheck and magicsock
Netcheck no longer performs I/O itself, instead it makes requests via
SendPacket and expects users to route reply traffic to
ReceiveSTUNPacket.

Netcheck gains a Standalone function that stands up sockets and
goroutines to implement I/O when used in a standalone fashion.

Magicsock now unconditionally routes STUN traffic to the netcheck.Client
that it hosts, and plumbs the send packet sink.

The CLI is updated to make use of the Standalone mode.

Fixes #8723

Signed-off-by: James Tucker <james@tailscale.com>
1 year ago
salman aljammaz 99e06d3544
magicsock: set the don't fragment sockopt (#8715)
This sets the Don't Fragment flag, for now behind the
TS_DEBUG_ENABLE_PMTUD envknob.

Updates #311.

Signed-off-by: Val <valerie@tailscale.com>
Signed-off-by: salman <salman@tailscale.com>
1 year ago
salman aljammaz 25a7204bb4
wgengine,ipn,cmd/tailscale: add size option to ping (#8739)
This adds the capability to pad disco ping message payloads to reach a
specified size. It also plumbs it through to the tailscale ping -size
flag.

Disco pings used for actual endpoint discovery do not use this yet.

Updates #311.

Signed-off-by: salman <salman@tailscale.com>
Co-authored-by: Val <valerie@tailscale.com>
1 year ago
salman aljammaz 68f8e5678e
wgengine/magicsock: remove dead code (#8745)
The nonce value is not read by anything, and di.sharedKey.Seal()
a few lines below generates its own. #cleanup

Signed-off-by: salman <salman@tailscale.com>
1 year ago
David Anderson 9d89e85db7 wgengine/magicsock: document mysterious-looking assignment
Updates #cleanup

Signed-off-by: David Anderson <danderson@tailscale.com>
1 year ago
David Anderson 84777354a0 wgengine/magicsock: factor out more separable parts
Updates #8720

Signed-off-by: David Anderson <danderson@tailscale.com>
1 year ago
David Anderson 9a76deb4b0 disco: move disco pcap helper to disco package
Updates tailscale/corp#13464

Signed-off-by: David Anderson <danderson@tailscale.com>
1 year ago
David Anderson cde37f5307 wgengine/magicsock: factor out peerMap into separate file
Updates tailscale/corp#13464

Signed-off-by: David Anderson <danderson@tailscale.com>
1 year ago
David Anderson f7016d8c00 wgengine/magicsock: factor out endpoint into its own file
Updates tailscale/corp#13464

Signed-off-by: David Anderson <danderson@tailscale.com>
1 year ago
David Anderson c2831f6614 wgengine/magicsock: delete unused stuff
Updates tailscale/corp#13464

Signed-off-by: David Anderson <danderson@tailscale.com>
1 year ago
Charlotte Brandhorst-Satzkorn 339397ab74 wgengine/magicsock: remove noV4/noV6 check in addrForSendWireGuardLocked
This change removes the noV4/noV6 check from addrForSendWireGuardLocked.

On Android, the client panics when reaching	`rand.Intn()`, likely due to
the candidates list being containing no candidates. The suspicion is
that the `noV4` and the `noV6` are both being triggered causing the
loop to continue.

Updates tailscale/corp#12938
Updates #7826

Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
1 year ago
Andrew Dunham 2a9d46c38f wgengine/magicsock: prefer private endpoints to public ones
Switch our best address selection to use a scoring-based approach, where
we boost each address based on whether it's a private IP or IPv6.

For users in cloud environments, this biases endpoint selection towards
using an endpoint that is less likely to cost the user money, and should
be less surprising to users.

This also involves updating the tests to not use private IPv4 addresses;
other than that change, the behaviour should be identical for existing
endpoints.

Updates #8097

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I069e3b399daea28be66b81f7e44fc27b2943d8af
1 year ago
Charlotte Brandhorst-Satzkorn ddb4040aa0
wgengine/magicsock: add address selection for wireguard only endpoints (#7979)
This change introduces address selection for wireguard only endpoints.
If a endpoint has not been used before, an address is randomly selected
to be used based on information we know about, such as if they are able
to use IPv4 or IPv6. When an address is initially selected, we also
initiate a new ICMP ping to the endpoints addresses to determine which
endpoint offers the best latency. This information is then used to
update which endpoint we should be using based on the best possible
route. If the latency is the same for a IPv4 and an IPv6 address, IPv6
will be used.

Updates #7826

Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
2 years ago
Andrew Dunham bcf7b63d7e wgengine/magicsock: add hysteresis to endpoint selection
Avoid selecting an endpoint as "better" than the current endpoint if the
total latency improvement is less than 1%. This adds some hysteresis to
avoid flapping between endpoints for a minimal improvement in latency.

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If8312e1768ea65c4b4d4e13d8de284b3825d7a73
2 years ago
Mihai Parparita 7330aa593e all: avoid repeated default interface lookups
On some platforms (notably macOS and iOS) we look up the default
interface to bind outgoing connections to. This is both duplicated
work and results in logspam when the default interface is not available
(i.e. when a phone has no connectivity, we log an error and thus cause
more things that we will try to upload and fail).

Fixed by passing around a netmon.Monitor to more places, so that we can
use its cached interface state.

Fixes #7850
Updates #7621

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
2 years ago
Mihai Parparita 4722f7e322 all: move network monitoring from wgengine/monitor to net/netmon
We're using it in more and more places, and it's not really specific to
our use of Wireguard (and does more just link/interface monitoring).

Also removes the separate interface we had for it in sockstats -- it's
a small enough package (we already pull in all of its dependencies
via other paths) that it's not worth the extra complexity.

Updates #7621
Updates #7850

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
2 years ago
Andrew Dunham f85dc6f97c
ci: add more lints (#7909)
This is a follow-up to #7905 that adds two more linters and fixes the corresponding findings. As per the previous PR, this only flags things that are "obviously" wrong, and fixes the issues found.

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8739bdb7bc4f75666a7385a7a26d56ec13741b7c
2 years ago
Andrew Dunham 80b138f0df wgengine/magicsock: keep advertising endpoints after we stop discovering them
Previously, when updating endpoints we would immediately stop
advertising any endpoint that wasn't discovered during
determineEndpoints. This could result in, for example, a case where we
performed an incremental netcheck, didn't get any of our three STUN
packets back, and then dropped our STUN endpoint from the set of
advertised endpoints... which would result in clients falling back to a
DERP connection until the next call to determineEndpoints.

Instead, let's cache endpoints that we've discovered and continue
reporting them to clients until a timeout expires. In the above case
where we temporarily don't have a discovered STUN endpoint, we would
continue reporting the old value, then re-discover the STUN endpoint
again and continue reporting it as normal, so clients never see a
withdrawal.

Updates tailscale/coral#108

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I42de72e7418ab328a6c732bdefc74549708cf8b9
2 years ago
Brad Fitzpatrick 4b49ca4a12 wgengine/magicsock: update comments on what implements conn.Bind
The comment still said *magicsock.Conn implemented wireguard-go conn.Bind.
That wasn't accurate anymore.

A doc #cleanup.

Change-Id: I7fd003b939497889cc81147bfb937b93e4f6865c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 10f1c90f4d wgengine/magicsock, types/nettype, etc: finish ReadFromUDPAddrPort netip migration
So we're staying within the netip.Addr/AddrPort consistently and
avoiding allocs/conversions to the legacy net addr types.

Updates #5162

Change-Id: I59feba60d3de39f773e68292d759766bac98c917
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 29f7df9d8f wgengine/magicsock, etc: remove mostly unused WriteTo methods
Updates #2331
Updates #5162

Change-Id: I8291884425481eeaedde38a54adfd8ed7292a497
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 6866aaeab3 wgengine/magicsock: factor out receiveIPv4 & receiveIPv6 common code
Updates #2331

Change-Id: I801df38b217f5d17203e8dc3b8654f44747e0f4b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Andrew Dunham 228d0c6aea net/netcheck: use dnscache.Resolver when resolving DERP IPs
This also adds a bunch of tests for this function to ensure that we're
returning the proper IP(s) in all cases.

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I0d9d57170dbab5f2bf07abdf78ecd17e0e635399
2 years ago
Mihai Parparita edb02b63f8 net/sockstats: pass in logger to sockstats.WithSockStats
Using log.Printf may end up being printed out to the console, which
is not desirable. I noticed this when I was investigating some client
logs with `sockstats: trace "NetcheckClient" was overwritten by another`.
That turns to be harmless/expected (the netcheck client will fall back
to the DERP client in some cases, which does its own sockstats trace).

However, the log output could be visible to users if running the
`tailscale netcheck` CLI command, which would be needlessly confusing.

Updates tailscale/corp#9230

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
2 years ago
James Tucker e09c434e5d wgengine/magicsock: remove locking sync requirements on conn disco keys
The lazy initialization of the disco key is not necessary, and
contributes to unnecessary locking and state checking.

Updates #cleanup

Signed-off-by: James Tucker <james@tailscale.com>
2 years ago
James Tucker e1b71c83ac wgengine/magicsock: remove unused fields on discoInfo
Updates #cleanup

Signed-off-by: James Tucker <james@tailscale.com>
2 years ago
James Tucker a257b2f88b wgengine/magicsock: add immutability documentation to endpointDisco
Updates #7825

Signed-off-by: James Tucker <james@tailscale.com>
2 years ago
Charlotte Brandhorst-Satzkorn c573bef0aa tailcfg,wgengine: add initial support for WireGuard only peers
A peer can have IsWireGuardOnly, which means it will not support DERP or
Disco, and it must have Endpoints filled in order to be usable.

In the present implementation only the first Endpoint will be used as
the bestAddr.

Updates tailscale/corp#10351

Co-authored-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
2 years ago
James Tucker 6cfcb3cae4 wgengine/magicsock: fix synchronization of endpoint disco fields
Identified in review in #7821 endpoint.discoKey and endpoint.discoShort
are often accessed without first taking endpoint.mu. The arrangement
with endpoint.mu is inconvenient for a good number of those call-sites,
so it is instead replaced with an atomic pointer to carry both pieces of
disco info. This will also help with #7821 that wants to add explicit
checks/guards to disable disco behaviors when disco keys are missing
which is necessarily implicitly mostly covered by this change.

Updates #7821

Signed-off-by: James Tucker <james@tailscale.com>
2 years ago