Commit Graph

328 Commits (151b4415ca88a24fb883e95edad0cf4b6c450543)

Author SHA1 Message Date
Brad Fitzpatrick 151b4415ca wgengine/magicsock: remove endpoint parameter from handlePingLocked
We can reply to a ping without knowing which exact node it's from.  As
long as it's in our netmap, it's safe to reply. If there's more than
one node with that discokey, it doesn't matter who we're relpying to.

Updates #3088

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick d86081f353 wgengine/magicsock: add new discoInfo type for DiscoKey state, move some fields
As more prep for removing the false assumption that you're able to
map from DiscoKey to a single peer, move the lastPingFrom and lastPingTime
fields from the endpoint type to a new discoInfo type, effectively upgrading
the old sharedDiscoKey map (which only held a *[32]byte nacl precomputed key
as its value) to discoInfo which then includes that naclbox key.

Then start plumbing it into handlePing in prep for removing the need
for handlePing to take an endpoint parameter.

Updates #3088

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick e5779f019e wgengine/magicsock: move temporary endpoint lookup later, add TODO to remove
Updates #3088

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 36a07089ee wgengine/magicsock: remove redundant/wrong sharedDiscoKey delete
The pass just after in this method handles cleaning up sharedDiscoKey.
No need to do it wrong (assuming DiscoKey => 1 node) earlier.

Updates #3088

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 3e80806804 wgengine/magicsock: pass src NodeKey to handleDiscoMessage for DERP disco msgs
And then use it to avoid another lookup-by-DiscoKey.

Updates #3088
3 years ago
Brad Fitzpatrick 82fa15fa3b wgengine/magicsock: start removing endpointForDiscoKey
It's not valid to assume that a discokey is globally unique.

This removes the first two of the four callers.

Updates #3088

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Avery Pennarun 0d4a0bf60e magicsock: if STUN failed to send before, rebind before STUNning again.
On iOS (and possibly other platforms), sometimes our UDP socket would
get stuck in a state where it was bound to an invalid interface (or no
interface) after a network reconfiguration. We can detect this by
actually checking the error codes from sending our STUN packets.

If we completely fail to send any STUN packets, we know something is
very broken. So on the next STUN attempt, let's rebind the UDP socket
to try to correct any problems.

This fixes a problem where iOS would sometimes get stuck using DERP
instead of direct connections until the backend was restarted.

Fixes #2994

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
3 years ago
David Anderson 830f641c6b wgengine/magicsock: update discokeys on netmap change.
Fixes #3008.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
Brad Fitzpatrick 2238814b99 wgengine/magicsock: fix crash introduced in recent cleanups
Fixes #2801

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 5bacbf3744 wgengine/magicsock, health, ipn/ipnstate: track DERP-advertised health
And add health check errors to ipnstate.Status (tailscale status --json).

Updates #2746
Updates #2775

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
David Anderson bb10443edf wgengine/wgcfg: use just the hexlified node key as the WireGuard endpoint.
The node key is all magicsock needs to find the endpoint that WireGuard
needs.

Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson dfd978f0f2 wgengine/magicsock: use NodeKey, not DiscoKey, as the trigger for lazy reconfig.
Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson 4c27e2fa22 wgengine/magicsock: remove Start method from Conn.
Over time, other magicsock refactors have made Start effectively a
no-op, except that some other functions choose to panic if called
before Start.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson 1a899344bd wgengine/magicsock: don't store tailcfg.Nodes alongside endpoints.
Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson b2181608b5 wgengine/magicsock: eagerly create endpoints in SetNetworkMap.
Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson 44d71d1e42 wgengine/magicsock: fix race in test shutdown, again.
We were returning an error almost, but not quite like errConnClosed in
a single codepath, which could still trip the panic on reconfig in the
test logic.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson 8bacfe6a37 wgengine/magicsock: remove unused sendLogLimit limiter.
Magicsock these days gets its logs limited by the global log limiter.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson e151b74f93 wgengine/magicsock: remove opts.SimulatedNetwork.
It only existed to override one test-only behavior with a
different test-only behavior, in both cases working around
an annoying feature of our CI environments. Instead, handle
that weirdness entirely in the test code, with a tweaked
TestOnlyPacketListener that gets injected.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson 58c1f7d51a wgengine/magicsock: rename opts.PacketListener to TestOnlyPacketListener.
The docstring said it was meant for use in tests, but it's specifically a
special codepath that is _only_ used in tests, so make the claim stronger.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson 8049063d35 wgengine/magicsock: rename discoEndpoint to just endpoint.
Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson f2d949e2db wgengine/magicsock: fold findEndpoint into its only remaining caller.
Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson fe2f89deab wgengine/magicsock: fix rare shutdown race in test.
Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson 97693f2e42 wgengine/magicsock: delete legacy AddrSet endpoints.
Instead of using the legacy codepath, teach discoEndpoint to handle
peers that have a home DERP, but no disco key. We can still communicate
with them, but only over DERP.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
slowy07 ac0353e982 fix: typo spelling grammar
Signed-off-by: slowy07 <slowy.arfy@gmail.com>
3 years ago
Brad Fitzpatrick 37053801bb wgengine/magicsock: restore a bit of logging on node becoming active
Fixes #2695

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 39610aeb09 wgengine/magicsock: move debug knobs to their own file, compile out on iOS
No need for these knobs on iOS where you can set the environment
variables anyway.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick f3c96df162 ipn/ipnstate: move tailscale status "active" determination to tailscaled
Fixes #2579

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick b622c60ed0 derp,wgengine/magicsock: don't assume stringer is in $PATH for go:generate
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Josh Bleecher Snyder 8a3d52e882 wgengine/magicsock: use mono.Time
magicsock makes multiple calls to Now per packet.
Move to mono.Now. Changing some of the calls to
use package mono has a cascading effect,
causing non-per-packet call sites to also switch.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 4dbbd0aa4a cmd/addlicense: add command to add licenseheaders to generated code
And use it to make our stringer invocations match the existing code.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder c179580599 wgengine/magicsock: add debug envvar to force all traffic over DERP
This would have been useful during debugging DERP issues recently.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Brad Fitzpatrick 92077ae78c wgengine/magicsock: make portmapping async
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
David Crawshaw 5f8ffbe166 magicsock: add SetPreferredPort method
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
3 years ago
Josh Bleecher Snyder ddf6c8c729 wgengine/magicsock: delete dead code
Co-authored-by: Adrian Dewhurst <adrian@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 1ece91cede go.mod: upgrade wireguard-windows, de-fork wireguard-go
Pull in the latest version of wireguard-windows.

Switch to upstream wireguard-go.
This requires reverting all of our import paths.

Unfortunately, this has to happen at the same time.
The wireguard-go change is very low risk,
as that commit matches our fork almost exactly.
(The only changes are import paths, CI files, and a go.mod entry.)
So if there are issues as a result of this commit,
the first place to look is wireguard-windows changes.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 25df067dd0 all: adapt to opaque netaddr types
This commit is a mishmash of automated edits using gofmt:

gofmt -r 'netaddr.IPPort{IP: a, Port: b} -> netaddr.IPPortFrom(a, b)' -w .
gofmt -r 'netaddr.IPPrefix{IP: a, Port: b} -> netaddr.IPPrefixFrom(a, b)' -w .

gofmt -r 'a.IP.Is4 -> a.IP().Is4' -w .
gofmt -r 'a.IP.As16 -> a.IP().As16' -w .
gofmt -r 'a.IP.Is6 -> a.IP().Is6' -w .
gofmt -r 'a.IP.As4 -> a.IP().As4' -w .
gofmt -r 'a.IP.String -> a.IP().String' -w .

And regexps:

\w*(.*)\.Port = (.*)  ->  $1 = $1.WithPort($2)
\w*(.*)\.IP = (.*)  ->  $1 = $1.WithIP($2)

And lots of manual fixups.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder ebcd7ab890 wgengine: remove wireguard-go DeviceOptions
We no longer need them.
This also removes the 32 bytes of prefix junk before endpoints.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder aacb2107ae all: add extra information to serialized endpoints
magicsock.Conn.ParseEndpoint requires a peer's public key,
disco key, and legacy ip/ports in order to do its job.
We currently accomplish that by:

* adding the public key in our wireguard-go fork
* encoding the disco key as magic hostname
* using a bespoke comma-separated encoding

It's a bit messy.

Instead, switch to something simpler: use a json-encoded struct
containing exactly the information we need, in the form we use it.

Our wireguard-go fork still adds the public key to the
address when it passes it to ParseEndpoint, but now the code
compensating for that is just a couple of simple, well-commented lines.
Once this commit is in, we can remove that part of the fork
and remove the compensating code.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder ddd85b9d91 wgengine/magicsock: rename discoEndpoint.wgEndpointHostPort to wgEndpoint
Fields rename only.

Part of the general effort to make our code agnostic about endpoint formatting.
It's just a name, but it will soon be a misleading one; be more generic.
Do this as a separate commit because it generates a lot of whitespace changes.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder bc68e22c5b all: s/CreateEndpoint/ParseEndpoint/ in docs
Upstream wireguard-go renamed the interface method
from CreateEndpoint to ParseEndpoint.
I missed some comments. Fix them.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 9d542e08e2 wgengine/magicsock: always run ReceiveIPv6
One of the consequences of the 	bind refactoring in 6f23087175
is that attempting to bind an IPv6 socket will always
result in c.pconn6.pconn being non-nil.
If the bind fails, it'll be set to a placeholder packet conn
that blocks forever.

As a result, we can always run ReceiveIPv6 and health check it.
This removes IPv4/IPv6 asymmetry and also will allow health checks
to detect any IPv6 receive func failures.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder fe50ded95c health: track whether we have a functional udp4 bind
Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder 7dc7078d96 wgengine/magicsock: use netaddr.IP in listenPacket
It must be an IP address; enforce that at the type level.

Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder 3c543c103a wgengine/magicsock: unify initial bind and rebind
We had two separate code paths for the initial UDP listener bind
and any subsequent rebinds.

IPv6 got left out of the rebind code.
Rather than duplicate it there, unify the two code paths.
Then improve the resulting code:

* Rebind had nested listen attempts to try the user-specified port first,
  and then fall back to :0 if that failed. Convert that into a loop.
* Initial bind tried only the user-specified port.
  Rebind tried the user-specified port and 0.
  But there are actually three ports of interest:
  The one the user specified, the most recent port in use, and 0.
  We now try all three in order, as appropriate.
* In the extremely rare case in which binding to port 0 fails,
  use a dummy net.PacketConn whose reads block until close.
  This will keep the wireguard-go receive func goroutine alive.

As a pleasant side-effect of this, if we decide that
we need to resuscitate #1796, it will now be much easier.

Fixes #1799

Co-authored-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder 8fb66e20a4 wgengine/magicsock: remove DefaultPort const
Assume it'll stay at 0 forever, so hard-code it
and delete code conditional on it being non-0.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder a8f61969b9 wgengine/magicsock: remove context arg from listenPacket
It was set to context.Background by all callers, for the same reasons.
Set it locally instead, to simplify call sites.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder 744de615f1 health, wgenegine: fix receive func health checks for the fourth time
The old implementation knew too much about how wireguard-go worked.
As a result, it missed genuine problems that occurred due to unrelated bugs.

This fourth attempt to fix the health checks takes a black box approach.
A receive func is healthy if one (or both) of these conditions holds:

* It is currently running and blocked.
* It has been executed recently.

The second condition is required because receive functions
are not continuously executing. wireguard-go calls them and then
processes their results before calling them again.

There is a theoretical false positive if wireguard-go go takes
longer than one minute to process the results of a receive func execution.
If that happens, we have other problems.

Updates #1790

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder 0d4c8cb2e1 health: delete ReceiveFunc health checks
They were not doing their job.
They need yet another conceptual re-think.
Start by clearing the decks.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder 8d7f7fc7ce health, wgenegine: fix receive func health checks yet again
The existing implementation was completely, embarrassingly conceptually broken.

We aren't able to see whether wireguard-go's receive function goroutines
are running or not. All we can do is model that based on what we have done.
This commit fixes that model.

Fixes #1781

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder 5835a3f553 health, wgengine/magicsock: avoid receive function false positives
Avery reported a sub-ms health transition from "receiveIPv4 not running" to "ok".

To avoid these transient false-positives, be more precise about
the expected lifetime of receive funcs. The problematic case is one in which
they were started but exited prior to a call to connBind.Close.
Explicitly represent started vs running state, taking care with the order of updates.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago