Commit Graph

1471 Commits (da3cf121947725342ac671d5afd4d8f2d526ff39)

Author SHA1 Message Date
Andrew Dunham 7429e8912a wgengine/netstack: fix bug with duplicate SYN packets in client limit
This fixes a bug that was introduced in #11258 where the handling of the
per-client limit didn't properly account for the fact that the gVisor
TCP forwarder will return 'true' to indicate that it's handled a
duplicate SYN packet, but not launch the handler goroutine.

In such a case, we neither decremented our per-client limit in the
wrapper function, nor did we do so in the handler function, leading to
our per-client limit table slowly filling up without bound.

Fix this by doing the same duplicate-tracking logic that the TCP
forwarder does so we can detect such cases and appropriately decrement
our in-flight counter.

Updates tailscale/corp#12184

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib6011a71d382a10d68c0802593f34b8153d06892
3 months ago
Andrew Dunham f072d017bd wgengine/magicsock: don't change DERP home when not connected to control
This pretty much always results in an outage because peers won't
discover our new home region and thus won't be able to establish
connectivity.

Updates tailscale/corp#18095

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ic0d09133f198b528dd40c6383b16d7663d9d37a7
3 months ago
Andrew Dunham 62cf83eb92 go.mod: bump gvisor
The `stack.PacketBufferPtr` type no longer exists; replace it with
`*stack.PacketBuffer` instead.

Updates #8043

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ib56ceff09166a042aa3d9b80f50b2aa2d34b3683
3 months ago
Andrew Dunham 4338db28f7 wgengine/magicsock: prefer link-local addresses to private ones
Since link-local addresses are definitionally more likely to be a direct
(lower-latency, more reliable) connection than a non-link-local private
address, give those a bit of a boost when selecting endpoints.

Updates #8097

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I93fdeb07de55ba39ba5fcee0834b579ca05c2a4e
3 months ago
Brad Fitzpatrick f18f591bc6 wgengine: plumb the PeerByKey from wgengine to magicsock
This was just added in 69f4b459 which doesn't yet use it. This still
doesn't yet use it. It just pushes it down deeper into magicsock where
it'll used later.

Updates #7617

Change-Id: If2f8fd380af150ffc763489e1ff4f8ca2899fac6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Percy Wegmann 2d5d6f5403 ipn,wgengine: only intercept TailFS traffic on quad 100
This fixes a regression introduced with 993acf4 and released in
v1.60.0.

The regression caused us to intercept all userspace traffic to port
8080 which prevented users from exposing their own services to their
tailnet at port 8080.

Now, we only intercept traffic to port 8080 if it's bound for
100.100.100.100 or fd7a:115c:a1e0::53.

Fixes #11283

Signed-off-by: Percy Wegmann <percy@tailscale.com>
(cherry picked from commit 17cd0626f3)
3 months ago
Brad Fitzpatrick 69f4b4595a wgengine{,/wgint}: add wgint.Peer wrapper type, add to wgengine.Engine
This adds a method to wgengine.Engine and plumbed down into magicsock
to add a way to get a type-safe Tailscale-safe wrapper around a
wireguard-go device.Peer that only exposes methods that are safe for
Tailscale to use internally.

It also removes HandshakeAttempts from PeerStatusLite that was just
added as it wasn't needed yet and is now accessible ala cart as needed
from the Peer type accessor.

None of this is used yet.

Updates #7617

Change-Id: I07be0c4e6679883e6eeddf8dbed7394c9e79c5f4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Brad Fitzpatrick b4ff9a578f wgengine: rename local variable from 'found' to conventional 'ok'
Updates #cleanup

Change-Id: I799dc86ea9e4a3a949592abdd8e74282e7e5d086
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Brad Fitzpatrick a8a525282c wgengine: use slices.Clone in two places
Updates #cleanup

Change-Id: I1cb30efb6d09180e82b807d6146f37897ef99307
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Brad Fitzpatrick 74b8985e19 ipn/ipnstate, wgengine: make PeerStatusLite.LastHandshake zero Time means none
... rather than 1970. Code was using IsZero against the 1970 team
(which isn't a zero value), but fortunately not anywhere that seems to
have mattered.

Updates #cleanup

Change-Id: I708a3f2a9398aaaedc9503678b4a8a311e0e019e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Andrew Dunham 3dd8ae2f26 net/tstun: fix spelling of "WireGuard"
Updates #cleanup

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ida7e30f4689bc18f5f7502f53a0adb5ac3c7981a
3 months ago
Andrew Dunham c5abbcd4b4 wgengine/netstack: add a per-client limit for in-flight TCP forwards
This is a fun one. Right now, when a client is connecting through a
subnet router, here's roughly what happens:

1. The client initiates a connection to an IP address behind a subnet
   router, and sends a TCP SYN
2. The subnet router gets the SYN packet from netstack, and after
   running through acceptTCP, starts DialContext-ing the destination IP,
   without accepting the connection¹
3. The client retransmits the SYN packet a few times while the dial is
   in progress, until either...
4. The subnet router successfully establishes a connection to the
   destination IP and sends the SYN-ACK back to the client, or...
5. The subnet router times out and sends a RST to the client.
6. If the connection was successful, the client ACKs the SYN-ACK it
   received, and traffic starts flowing

As a result, the notification code in forwardTCP never notices when a
new connection attempt is aborted, and it will wait until either the
connection is established, or until the OS-level connection timeout is
reached and it aborts.

To mitigate this, add a per-client limit on how many in-flight TCP
forwarding connections can be in-progress; after this, clients will see
a similar behaviour to the global limit, where new connection attempts
are aborted instead of waiting. This prevents a single misbehaving
client from blocking all other clients of a subnet router by ensuring
that it doesn't starve the global limiter.

Also, bump the global limit again to a higher value.

¹ We can't accept the connection before establishing a connection to the
remote server since otherwise we'd be opening the connection and then
immediately closing it, which breaks a bunch of stuff; see #5503 for
more details.

Updates tailscale/corp#12184

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I76e7008ddd497303d75d473f534e32309c8a5144
3 months ago
Brad Fitzpatrick 1cf85822d0 ipn/ipnstate, wgengine/wgint: add handshake attempts accessors
Not yet used. This is being made available so magicsock/wgengine can
use it to ignore certain sends (UDP + DERP) later on at least mobile,
letting wireguard-go think it's doing its full attempt schedule, but
we can cut it short conditionally based on what we know from the
control plane.

Updates #7617

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: Ia367cf6bd87b2aeedd3c6f4989528acdb6773ca7
3 months ago
Brad Fitzpatrick eb28818403 wgengine: make pendOpen time later, after dup check
Otherwise on OS retransmits, we'd make redundant timers in Go's timer
heap that upon firing just do nothing (well, grab a mutex and check a
map and see that there's nothing to do).

Updates #cleanup

Change-Id: Id30b8b2d629cf9c7f8133a3f7eca5dc79e81facb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Brad Fitzpatrick 219efebad4 wgengine: reduce critical section
No need to hold wgLock while using the device to LookupPeer;
that has its own mutex already.

Updates #cleanup

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: Ib56049fcc7163cf5a2c2e7e12916f07b4f9d67cb
3 months ago
Nick Khyl 7ef1fb113d cmd/tailscaled, ipn/ipnlocal, wgengine: shutdown tailscaled if wgdevice is closed
Tailscaled becomes inoperative if the Tailscale Tunnel wintun adapter is abruptly removed.
wireguard-go closes the device in case of a read error, but tailscaled keeps running.
This adds detection of a closed WireGuard device, triggering a graceful shutdown of tailscaled.
It is then restarted by the tailscaled watchdog service process.

Fixes #11222

Signed-off-by: Nick Khyl <nickk@tailscale.com>
3 months ago
Anton Tolchanov cd9cf93de6 wgengine/netstack: expose TCP forwarder drops via clientmetrics
- add a clientmetric with a counter of TCP forwarder drops due to the
  max attempts;
- fix varz metric types, as they are all counters.

Updates #8210

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
3 months ago
Brad Fitzpatrick e1bd7488d0 all: remove LenIter, use Go 1.22 range-over-int instead
Updates #11058
Updates golang/go#65685

Change-Id: Ibb216b346e511d486271ab3d84e4546c521e4e22
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Brad Fitzpatrick 6ad6d6b252 wgengine/wglog: add TS_DEBUG_RAW_WGLOG envknob for raw wg logs
Updates #7617 (part of debugging it)

Change-Id: I1bcbdcf0f929e3bcf83f244b1033fd438aa6dac1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
Brad Fitzpatrick 8b9474b06a wgengine/wgcfg: don't send UAPI to disable keep-alives on new peers
That's already the default. Avoid the overhead of writing it on one
side and reading it on the other to do nothing.

Updates #cleanup (noticed while researching something else)

Change-Id: I449c88a022271afb9be5da876bfaf438fe5d3f58
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 months ago
James Tucker 131f9094fd wgengine/wglog: quieten WireGuard logs for allowedips
An increasing number of users have very large subnet route
configurations, which can produce very large amounts of log data when
WireGuard is reconfigured. The logs don't contain the actual routes, so
they're largely useless for diagnostics, so we'll just suppress them.

Fixes tailscale/corp#17532

Signed-off-by: James Tucker <james@tailscale.com>
3 months ago
Jason Barnett 4d668416b8 wgengine/router: fix ip rule restoration
Fixes #10857

Signed-off-by: Jason Barnett <J@sonBarnett.com>
4 months ago
Aaron Klotz f7acbefbbb wgengine/router: make the Windows ifconfig implementation reuse existing MibIPforwardRow2 when possible
Looking at profiles, we spend a lot of time in winipcfg.LUID.DeleteRoute
looking up the routing table entry for the provided RouteData.

But we already have the row! We previously obtained that data via the full
table dump we did in getInterfaceRoutes. We can make this a lot faster by
hanging onto a reference to the wipipcfg.MibIPforwardRow2 and executing
the delete operation directly on that.

Fixes #11123

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
4 months ago
Percy Wegmann c42a4e407a tailfs: listen for local clients only on 100.100.100.100
FileSystemForLocal was listening on the node's Tailscale address,
which potentially exposes the user's view of TailFS shares to other
Tailnet users. Remote nodes should connect to exported shares via
the peerapi.

This removes that code so that FileSystemForLocal is only avaialable
on 100.100.100.100:8080.

Updates tailscale/corp#16827

Signed-off-by: Percy Wegmann <percy@tailscale.com>
4 months ago
Percy Wegmann ddcffaef7a tailfs: disable TailFSForLocal via policy
Adds support for node attribute tailfs:access. If this attribute is
not present, Tailscale will not accept connections to the local TailFS
server at 100.100.100.100:8080.

Updates tailscale/corp#16827

Signed-off-by: Percy Wegmann <percy@tailscale.com>
4 months ago
Percy Wegmann abab0d4197 tailfs: clean up naming and package structure
- Restyles tailfs -> tailFS
- Defines interfaces for main TailFS types
- Moves implemenatation of TailFS into tailfsimpl package

Updates tailscale/corp#16827

Signed-off-by: Percy Wegmann <percy@tailscale.com>
4 months ago
Percy Wegmann 993acf4475 tailfs: initial implementation
Add a WebDAV-based folder sharing mechanism that is exposed to local clients at
100.100.100.100:8080 and to remote peers via a new peerapi endpoint at
/v0/tailfs.

Add the ability to manage folder sharing via the new 'share' CLI sub-command.

Updates tailscale/corp#16827

Signed-off-by: Percy Wegmann <percy@tailscale.com>
4 months ago
Joe Tsai 94a4f701c2
all: use reflect.TypeFor now available in Go 1.22 (#11078)
Updates #cleanup

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
4 months ago
Jordan Whited 8b47322acc
wgengine/magicsock: implement probing of UDP path lifetime (#10844)
This commit implements probing of UDP path lifetime on the tail end of
an active direct connection. Probing configuration has two parts -
Cliffs, which are various timeout cliffs of interest, and
CycleCanStartEvery, which limits how often a probing cycle can start,
per-endpoint. Initially a statically defined default configuration will
be used. The default configuration has cliffs of 10s, 30s, and 60s,
with a CycleCanStartEvery of 24h. Probing results are communicated via
clientmetric counters. Probing is off by default, and can be enabled
via control knob. Probing is purely informational and does not yet
drive any magicsock behaviors.

Updates #540

Signed-off-by: Jordan Whited <jordan@tailscale.com>
4 months ago
James Tucker 7e3bcd297e go.mod,wgengine/netstack: bump gvisor
Updates #8043

Signed-off-by: James Tucker <james@tailscale.com>
5 months ago
Claire Wang 213d696db0
magicsock: mute noisy expected peer mtu related error (#10870) 5 months ago
Andrew Dunham 7a0392a8a3 wgengine/netstack: expose gVisor metrics through expvar
When tailscaled is run with "-debug 127.0.0.1:12345", these metrics are
available at:
    http://localhost:12345/debug/metrics

Updates #8210

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I19db6c445ac1f8344df2bc1066a3d9c9030606f8
5 months ago
Andrew Dunham 6540d1f018 wgengine/router: look up absolute path to netsh.exe on Windows
This is in response to logs from a customer that show that we're unable
to run netsh due to the following error:

    router: firewall: adding Tailscale-Process rule to allow UDP for "C:\\Program Files\\Tailscale\\tailscaled.exe" ...
    router: firewall: error adding Tailscale-Process rule: exec: "netsh": cannot run executable found relative to current directory:

There's approximately no reason to ever dynamically look up the path of
a system utility like netsh.exe, so instead let's first look for it
in the System32 directory and only if that fails fall back to the
previous behaviour.

Updates #10804

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I68cfeb4cab091c79ccff3187d35f50359a690573
5 months ago
Jordan Whited b084888e4d
wgengine/magicsock: fix typos in docs (#10729)
Updates #cleanup

Signed-off-by: Jordan Whited <jordan@tailscale.com>
5 months ago
Andrew Lytvynov 2716250ee8
all: cleanup unused code, part 2 (#10670)
And enable U1000 check in staticcheck.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
5 months ago
Andrew Lytvynov 1302bd1181
all: cleanup unused code, part 1 (#10661)
Run `staticcheck` with `U1000` to find unused code. This cleans up about
a half of it. I'll do the other half separately to keep PRs manageable.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
6 months ago
Jordan Whited 685b853763
wgengine/magicsock: fix handling of derp.PeerGoneMessage (#10589)
The switch in Conn.runDerpReader() on the derp.ReceivedMessage type
contained cases other than derp.ReceivedPacket that fell through to
writing to c.derpRecvCh, which should only be reached for
derp.ReceivedPacket. This can result in the last/previous
derp.ReceivedPacket to be re-handled, effectively creating a duplicate
packet. If the last derp.ReceivedPacket happens to be a
disco.CallMeMaybe it may result in a disco ping scan towards the
originating peer on the endpoints contained.

The change in this commit moves the channel write on c.derpRecvCh and
subsequent select awaiting the result into the derp.ReceivedMessage
case, preventing it from being reached from any other case. Explicit
continue statements are also added to non-derp.ReceivedPacket cases
where they were missing, in order to signal intent to the reader.

Fixes #10586

Signed-off-by: Jordan Whited <jordan@tailscale.com>
6 months ago
Andrew Dunham 727acf96a6 net/netcheck: use DERP frames as a signal for home region liveness
This uses the fact that we've received a frame from a given DERP region
within a certain time as a signal that the region is stil present (and
thus can still be a node's PreferredDERP / home region) even if we don't
get a STUN response from that region during a netcheck.

This should help avoid DERP flaps that occur due to losing STUN probes
while still having a valid and active TCP connection to the DERP server.

RELNOTE=Reduce home DERP flapping when there's still an active connection

Updates #8603

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: If7da6312581e1d434d5c0811697319c621e187a0
6 months ago
Naman Sood 97f84200ac
wgengine/router: implement UpdateMagicsockPort for CallbackRouter (#10494)
Updates #9084.

Signed-off-by: Naman Sood <mail@nsood.in>
6 months ago
Naman Sood d46a4eced5
util/linuxfw, wgengine: allow ingress to magicsock UDP port on Linux (#10370)
* util/linuxfw, wgengine: allow ingress to magicsock UDP port on Linux

Updates #9084.

Currently, we have to tell users to manually open UDP ports on Linux when
certain firewalls (like ufw) are enabled. This change automates the process of
adding and updating those firewall rules as magicsock changes what port it
listens on.

Signed-off-by: Naman Sood <mail@nsood.in>
6 months ago
Naman Sood 0a59754eda linuxfw,wgengine/route,ipn: add c2n and nodeattrs to control linux netfilter
Updates tailscale/corp#14029.

Signed-off-by: Naman Sood <mail@nsood.in>
6 months ago
James Tucker 215f657a5e wgengine/router: create netfilter runner in setNetfilterMode
This will enable the runner to be replaced as a configuration side
effect in a later change.

Updates tailscale/corp#14029

Signed-off-by: James Tucker <james@tailscale.com>
6 months ago
Andrew Lytvynov 263e01c47b
wgengine/filter: add protocol-agnostic packet checker (#10446)
For use in ACL tests, we need a way to check whether a packet is allowed
not just with TCP, but any protocol.

Updates #3561

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
6 months ago
Jordan Whited 5e861c3871
wgengine/netstack: disable RACK on Windows (#10402)
Updates #9707

Signed-off-by: Jordan Whited <jordan@tailscale.com>
6 months ago
Jordan Whited 1af7f5b549
wgengine/magicsock: fix typo in Conn.handlePingLocked() (#10365)
Updates #cleanup

Signed-off-by: Jordan Whited <jordan@tailscale.com>
6 months ago
Brad Fitzpatrick 4d196c12d9 health: don't report a warning in DERP homeless mode
Updates #3363
Updates tailscale/corp#396

Change-Id: Ibfb0496821cb58a78399feb88d4206d81e95ca0f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick 3bd382f369 wgengine/magicsock: add DERP homeless debug mode for testing
In DERP homeless mode, a DERP home connection is not sought or
maintained and the local node is not reachable.

Updates #3363
Updates tailscale/corp#396

Change-Id: Ibc30488ac2e3cfe4810733b96c2c9f10a51b8331
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Jordan Whited 2ff54f9d12
wgengine/magicsock: move trustBestAddrUntil forward on non-disco rx (#10274)
This is gated behind the silent disco control knob, which is still in
its infancy. Prior to this change disco pong reception was the only
event that could move trustBestAddrUntil forward, so even though we
weren't heartbeating, we would kick off discovery pings every
trustUDPAddrDuration and mirror to DERP.

Updates #540

Signed-off-by: Jordan Whited <jordan@tailscale.com>
7 months ago
Jordan Whited c99488ea19
wgengine/magicsock: fix typo in endpoint.sendDiscoPing() docs (#10232)
Updates #cleanup

Signed-off-by: Jordan Whited <jordan@tailscale.com>
7 months ago
Jordan Whited e848736927
control/controlknobs,wgengine/magicsock: implement SilentDisco toggle (#10195)
This change exposes SilentDisco as a control knob, and plumbs it down to
magicsock.endpoint. No changes are being made to magicsock.endpoint
disco behavior, yet.

Updates #540

Signed-off-by: Jordan Whited <jordan@tailscale.com>
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago