Commit Graph

138 Commits (aacb2107aeb19a9e91627d6a9da267ad32db3759)

Author SHA1 Message Date
Josh Bleecher Snyder aacb2107ae all: add extra information to serialized endpoints
magicsock.Conn.ParseEndpoint requires a peer's public key,
disco key, and legacy ip/ports in order to do its job.
We currently accomplish that by:

* adding the public key in our wireguard-go fork
* encoding the disco key as magic hostname
* using a bespoke comma-separated encoding

It's a bit messy.

Instead, switch to something simpler: use a json-encoded struct
containing exactly the information we need, in the form we use it.

Our wireguard-go fork still adds the public key to the
address when it passes it to ParseEndpoint, but now the code
compensating for that is just a couple of simple, well-commented lines.
Once this commit is in, we can remove that part of the fork
and remove the compensating code.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder e0bd3cc70c wgengine/magicsock: use netaddr.MustParseIPPrefix
Delete our bespoke helper.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 64047815b0 wgenengine/magicsock: delete cursed tests
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Josh Bleecher Snyder 7ee891f5fd all: delete wgcfg.Key and wgcfg.PrivateKey
For historical reasons, we ended up with two near-duplicate
copies of curve25519 key types, one in the wireguard-go module
(wgcfg) and one in the tailscale module (types/wgkey).
Then we moved wgcfg to the tailscale module.
We can now remove the wgcfg key type in favor of wgkey.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
3 years ago
Brad Fitzpatrick 34d2f5a3d9 tailcfg: add Endpoint, EndpointType, MapRequest.EndpointType
Track endpoints internally with a new tailcfg.Endpoint type that
includes a typed netaddr.IPPort (instead of just a string) and
includes a type for how that endpoint was discovered (STUN, local,
etc).

Use []tailcfg.Endpoint instead of []string internally.

At the last second, send it to the control server as the existing
[]string for endpoints, but also include a new parallel
MapRequest.EndpointType []tailcfg.EndpointType, so the control server
can start filtering out less-important endpoint changes from
new-enough clients. Notably, STUN-discovered endpoints can be filtered
out from 1.6+ clients, as they can discover them amongst each other
via CallMeMaybe disco exchanges started over DERP. And STUN endpoints
change a lot, causing a lot of MapResposne updates. But portmapped
endpoints are worth keeping for now, as they they work right away
without requiring the firewall traversal extra RTT dance.

End result will be less control->client bandwidth. (despite negligible
increase in client->control bandwidth)

Updates tailscale/corp#1543

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Josh Bleecher Snyder 69cdc30c6d wgengine/wgcfg: remove Config.ListenPort
We don't use the port that wireguard-go passes to us (via magicsock.connBind.Open).
We ignore it entirely and use the port we selected.

When we tell wireguard-go that we're changing the listen_port,
it calls connBind.Close and then connBind.Open.
And in the meantime, it stops calling the receive functions,
which means that we stop receiving and processing UDP and DERP packets.
And that is Very Bad.

That was never a problem prior to b3ceca1dd7,
because we passed the SkipBindUpdate flag to our wireguard-go fork,
which told wireguard-go not to re-bind on listen_port changes.
That commit eliminated the SkipBindUpdate flag.

We could write a bunch of code to work around the gap.
We could add background readers that process UDP and DERP packets when wireguard-go isn't.
But it's simpler to never create the conditions in which wireguard-go rebinds.

The other scenario in which wireguard-go re-binds is device.Down.
Conveniently, we never call device.Down. We go from device.Up to device.Close,
and the latter only when we're shutting down a magicsock.Conn completely.

Rubber-ducked-by: Avery Pennarun <apenwarr@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder b3ceca1dd7 wgengine/...: split into multiple receive functions
Upstream wireguard-go has changed its receive model.
NewDevice now accepts a conn.Bind interface.

The conn.Bind is stateless; magicsock.Conns are stateful.
To work around this, we add a connBind type that supports
cheap teardown and bring-up, backed by a Conn.

The new conn.Bind allows us to specify a set of receive functions,
rather than having to shoehorn everything into ReceiveIPv4 and ReceiveIPv6.
This lets us plumbing DERP messages directly into wireguard-go,
instead of having to mux them via ReceiveIPv4.

One consequence of the new conn.Bind layer is that
closing the wireguard-go device is now indistinguishable
from the routine bring-up and tear-down normally experienced
by a conn.Bind. We thus have to explicitly close the magicsock.Conn
when the close the wireguard-go device.

One downside of this change is that we are reliant on wireguard-go
to call receiveDERP to process DERP messages. This is fine for now,
but is perhaps something we should fix in the future.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 34d4943357 all: gofmt -s
The code is not obviously better or worse, but this makes the little warning
triangle in my editor go away, and the distraction removal is worth it.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 1df162b05b wgengine/magicsock: adapt CreateEndpoint signature to match wireguard-go
Part of a temporary change to make merging wireguard-go easier.
See https://github.com/tailscale/wireguard-go/pull/45.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 36a85e1760 wgengine/magicsock: don't call t.Fatal in magicStack.IP
It can end up executing an a new goroutine,
at which point instead of immediately stopping test execution, it hangs.
Since this is unexpected anyway, panic instead.
As a bonus, it makes call sites nicer and removes a kludge comment.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
David Anderson 016de16b2e net/tstun: rename TUN to Wrapper.
The tstun packagen contains both constructors for generic tun
Devices, and a wrapper that provides additional functionality.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson 588b70f468 net/tstun: merge in wgengine/tstun.
Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
Josh Bleecher Snyder 28af46fb3b wgengine: pass logger as a separate arg to device.NewDevice
Adapt to minor API changes in wireguard-go.
And factor out device.DeviceOptions variables.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 4b77eca2de wgengine/magicsock: check returned error in addTestEndpoint
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Brad Fitzpatrick c99f260e40 wgengine/magicsock: prefer IPv6 transport if roughly equivalent latency
Fixes #1566

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 9643d8b34d wgengine/magicsock: add an addrLatency type to combine an IPPort+time.Duration
Updates #1566 (but no behavior changes as of this change)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 7e0d12e7cc wgengine/magicsock: don't update control if only endpoint order changes
Updates #1559

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 387e83c8fe wgengine/magicsock: fix Conn.Rebind race that let ErrClosed errors be read
There was a logical race where Conn.Rebind could acquire the
RebindingUDPConn mutex, close the connection, fail to rebind, release
the mutex, and then because the mutex was no longer held, ReceiveIPv4
wouldn't retry reads that failed with net.ErrClosed, letting that
error back to wireguard-go, which would then stop running that receive
IP goroutine.

Instead, keep the RebindingUDPConn mutex held for the entirety of the
replacement in all cases.

Updates tailscale/corp#1289

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
David Anderson 2404c0ffad ipn/ipnlocal: only filter out default routes when computing the local wg config.
UIs need to see the full unedited netmap in order to know what exit nodes they
can offer to the user.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
Brad Fitzpatrick e9e4f1063d wgengine/magicsock: fix discoEndpoint caching bug when a node key changes
Fixes #1391

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick c64bd587ae net/portmapper: add NAT-PMP client, move port mapping service probing
* move probing out of netcheck into new net/portmapper package
* use PCP ANNOUNCE op codes for PCP discovery, rather than causing
  short-lived (sub-second) side effects with a 1-second-expiring map +
  delete.
* track when we heard things from the router so we can be less wasteful
  in querying the router's port mapping services in the future
* use portmapper from magicsock to map a public port

Fixes #1298
Fixes #1080
Fixes #1001
Updates #864

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Josh Bleecher Snyder c7e5ab8094 wgengine/magicsock: retry and re-send packets in TestTwoDevicePing
When a handshake race occurs, a queued data packet can get lost.
TestTwoDevicePing expected that the very first data packet would arrive.
This caused occasional flakes.

Change TestTwoDevicePing to repeatedly re-send packets
and succeed when one of them makes it through.

This is acceptable (vs making WireGuard not drop the packets)
because this only affects communication with extremely old clients.
And those extremely old clients will eventually connect,
because the kernel will retry sends on timeout.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 88586ec4a4 wgengine/magicsock: remove an alloc from ReceiveIPvN
We modified the standard net package to not allocate a *net.UDPAddr
during a call to (*net.UDPConn).ReadFromUDP if the caller's use
of the *net.UDPAddr does not cause it to escape.
That is https://golang.org/cl/291390.

This is the companion change to magicsock.
There are two changes required.
First, call ReadFromUDP instead of ReadFrom, if possible.
ReadFrom returns a net.Addr, which is an interface, which always allocates.
Second, reduce the lifetime of the returned *net.UDPAddr.
We do this by immediately converting it into a netaddr.IPPort.

We left the existing RebindingUDPConn.ReadFrom method in place,
as it is required to satisfy the net.PacketConn interface.

With the upstream change and both of these fixes in place,
we have removed one large allocation per packet received.

name           old time/op    new time/op    delta
ReceiveFrom-8    16.7µs ± 5%    16.4µs ± 8%     ~     (p=0.310 n=5+5)

name           old alloc/op   new alloc/op   delta
ReceiveFrom-8      112B ± 0%       64B ± 0%  -42.86%  (p=0.008 n=5+5)

name           old allocs/op  new allocs/op  delta
ReceiveFrom-8      3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.008 n=5+5)

Co-authored-by: Sonia Appasamy <sonia@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 0c673c1344 wgengine/magicsock: unify on netaddr types in addrSet
addrSet maintained duplicate lists of netaddr.IPPorts and net.UDPAddrs.
Unify to use the netaddr type only.

This makes (*Conn).ReceiveIPvN a bit uglier,
but that'll be cleaned up in a subsequent commit.

This is preparatory work to remove an allocation from ReceiveIPv4.

Co-authored-by: Sonia Appasamy <sonia@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 4cd9218351 wgengine/magicsock: prevent logging while running benchmarks
Co-authored-by: Sonia Appasamy <sonia@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 635e4c7435 wgengine/magicsock: increase legacy ping timeout again
I based my estimation of the required timeout based on locally
observed behavior. But CI machines are worse than my local machine.
16s was enough to reduce flakiness but not eliminate it. Bump it up again.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Brad Fitzpatrick 6b365b0239 wgengine/magicsock: fix DERP reader hang regression during concurrent reads
Fixes #1282

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Josh Bleecher Snyder e1f773ebba wgengine/magicsock: allow more time for pings to transit
We removed the "fast retry" code from our wireguard-go fork.
As a result, pings can take longer to transit when retries are required. 
Allow that.

Fixes #1277

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Brad Fitzpatrick 6d2b8df06d wgengine/magicsock: add disabled failing (deadlocking) test for #1282
The fix can make this test run unconditionally.

This moves code from 5c619882bc for
testability but doesn't fix it yet. The #1282 problem remains (when I
wrote its wake-up mechanism, I forgot there were N DERP readers
funneling into 1 UDP reader, and the code just isn't correct at all
for that case).

Also factor out some test helper code from BenchmarkReceiveFrom.

The refactoring in magicsock.go for testability should have no
behavior change.
3 years ago
Brad Fitzpatrick 1e7a35b225 types/netmap: split controlclient.NetworkMap off into its own leaf package
Updates #1278

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 6064b6ff47 wgengine/wgcfg/nmcfg: split control/controlclient/netmap.go into own package
It couldn't move to ipnlocal due to test dependency cycles.

Updates #1278

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
David Anderson ace57d7627 wgengine/magicsock: set a dummy private key in benchmark.
Magicsock started dropping all traffic internally when Tailscale is
shut down, to avoid spurious wireguard logspam. This made the benchmark
not receive anything. Setting a dummy private key is sufficient to get
magicsock to pass traffic for benchmarking purposes.

Fixes #1270.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
Josh Bleecher Snyder e8cd7bb66f tstest: simplify goroutine leak tests
Use tb.Cleanup to simplify both the API and the implementation.

One behavior change: When the number of goroutines shrinks, don't log.
I've never found these logs to be useful, and they frequently add noise.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder fe7c3e9c17 all: move wgcfg from wireguard-go
This is mostly code movement from the wireguard-go repo.

Most of the new wgcfg package corresponds to the wireguard-go wgcfg package.

wgengine/wgcfg/device{_test}.go was device/config{_test}.go.
There were substantive but simple changes to device_test.go to remove
internal package device references.

The API of device.Config (now wgcfg.DeviceConfig) grew an error return;
we previously logged the error and threw it away.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder d5baeeed5c wgengine: use Tailscale-style peer identifiers in logs
Rewrite log lines on the fly, based on the set of known peers.

This enables us to use upstream wireguard-go logging,
but maintain the Tailscale-style peer public key identifiers
that the rest of our systems (and people) expect.

Fixes #1183

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder e4c075cd95 wgengine/magicsock: prevent log-after-test in TestTwoDevicePing 3 years ago
Brad Fitzpatrick 5c619882bc wgengine/magicsock: simplify ReceiveIPv4+DERP path
name           old time/op  new time/op  delta
ReceiveFrom-4  35.8µs ± 3%  21.9µs ± 5%  -38.92%  (p=0.008 n=5+5)

Fixes #1145
Updates #414

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Josh Bleecher Snyder ed2169ae99 wgengine/magicsock: prevent logging after TestActiveDiscovery completes 3 years ago
Josh Bleecher Snyder 63af950d8c wgengine/magicsock: adapt to wireguard-go without UpdateDst
22507adf54 stopped relying on
our fork of wireguard-go's UpdateDst callback.
As a result, we can unwind that code,
and the extra return value of ReceiveIPv{4,6}.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Denton Gentry 23c2dc2165
magicksock: remove TestConnClosing. (#1140)
Test is flakey, remove it and figure out what to do differently later.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
3 years ago
David Anderson e23b4191c4 wgengine/magicsock: disable legacy networking everywhere except TwoDevicePing.
TwoDevicePing is explicitly testing the behavior of the legacy codepath, everything
else is happy to assume that code no longer exists.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson 0733c5d2e0 wgengine/magicsock: disable legacy behavior in a few more tests.
Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson a2463e8948 wgengine/magicsock: add an option to disable legacy peer handling.
Used in tests to ensure we're not relying on behavior we're going
to remove eventually.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson d456bfdc6d wgengine/magicsock: fix BenchmarkReceiveFrom.
Previously, this benchmark relied on behavior of the legacy
receive codepath, which I changed in 22507adf. With this
change, the benchmark instead relies on the new active discovery
path.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
Josh Bleecher Snyder 2d837f79dc wgengine/magicsock: close test loggers once we're done with them
This is a big hammer approach to helping with #1132.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder d2529affa2 wgengine/magicsock: quiet wireguard-go logging in tests
We already do this in newUserspaceEngineAdvanced.
Apply it to newMagicStack as well.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 654b5f1570 all: convert from []wgcfg.Endpoint to string
This eliminates a dependency on wgcfg.Endpoint,
as part of the effort to eliminate our wireguard-go fork.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
David Anderson 22507adf54 wgengine/magicsock: stop depending on UpdateDst in legacy codepaths.
This makes connectivity between ancient and new tailscale nodes slightly
worse in some cases, but only in cases where the ancient version would
likely have failed to get connectivity anyway.

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
Denton Gentry 8349e10907 magicsock: add description of testClosingContext
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
3 years ago
Denton Gentry 2e9728023b magicsock: test error case in sendDiscoMessage
In sendDiscoMessage there is a check of whether the connection is
closed, which is not being reliably exercised by other tests.
This shows up in code coverage reports, the lines of code in
sendDiscoMessage are alternately added and subtracted from
code coverage.

Add a test to specifically exercise and verify this code path.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
3 years ago