Commit Graph

842 Commits (ba59c0391baf099aef6e7c4e5611db4c7f040d76)

Author SHA1 Message Date
Josh Bleecher Snyder 1ece91cede go.mod: upgrade wireguard-windows, de-fork wireguard-go
Pull in the latest version of wireguard-windows.

Switch to upstream wireguard-go.
This requires reverting all of our import paths.

Unfortunately, this has to happen at the same time.
The wireguard-go change is very low risk,
as that commit matches our fork almost exactly.
(The only changes are import paths, CI files, and a go.mod entry.)
So if there are issues as a result of this commit,
the first place to look is wireguard-windows changes.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder ceaaa23962 wgengine/wglog: cache strings
We repeat many peers each time we call SetPeers.
Instead of constructing strings for them from scratch every time,
keep strings alive across iterations.

name        old time/op    new time/op    delta
SetPeers-8    3.58µs ± 1%    2.41µs ± 1%  -32.60%  (p=0.000 n=9+10)

name        old alloc/op   new alloc/op   delta
SetPeers-8    2.53kB ± 0%    1.30kB ± 0%  -48.73%  (p=0.000 n=10+10)

name        old allocs/op  new allocs/op  delta
SetPeers-8      99.0 ± 0%      16.0 ± 0%  -83.84%  (p=0.000 n=10+10)

We could reduce alloc/op 12% and allocs/op 23% if strs had
type map[string]strCache instead of map[string]*strCache,
but that wipes out the execution time impact.
Given that re-use is the most common scenario, let's optimize for it.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder 73adbb7a78 wgengine: pass an addressable value to deephash.UpdateHash
This makes deephash more efficient.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder 8bf2a38f29 go.mod: update wireguard-go, taking control over iOS memory usage from our fork
Our wireguard-go fork used different values from upstream for
package device's memory limits on iOS.

This was the last blocker to removing our fork.

These values are now vars rather than consts for iOS.

c27ff9b9f6

Adjust them on startup to our preferred values.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder 25df067dd0 all: adapt to opaque netaddr types
This commit is a mishmash of automated edits using gofmt:

gofmt -r 'netaddr.IPPort{IP: a, Port: b} -> netaddr.IPPortFrom(a, b)' -w .
gofmt -r 'netaddr.IPPrefix{IP: a, Port: b} -> netaddr.IPPrefixFrom(a, b)' -w .

gofmt -r 'a.IP.Is4 -> a.IP().Is4' -w .
gofmt -r 'a.IP.As16 -> a.IP().As16' -w .
gofmt -r 'a.IP.Is6 -> a.IP().Is6' -w .
gofmt -r 'a.IP.As4 -> a.IP().As4' -w .
gofmt -r 'a.IP.String -> a.IP().String' -w .

And regexps:

\w*(.*)\.Port = (.*)  ->  $1 = $1.WithPort($2)
\w*(.*)\.IP = (.*)  ->  $1 = $1.WithIP($2)

And lots of manual fixups.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Brad Fitzpatrick 5b52b64094 tsnet: add Tailscale-as-a-library package
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Josh Bleecher Snyder ebcd7ab890 wgengine: remove wireguard-go DeviceOptions
We no longer need them.
This also removes the 32 bytes of prefix junk before endpoints.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder aacb2107ae all: add extra information to serialized endpoints
magicsock.Conn.ParseEndpoint requires a peer's public key,
disco key, and legacy ip/ports in order to do its job.
We currently accomplish that by:

* adding the public key in our wireguard-go fork
* encoding the disco key as magic hostname
* using a bespoke comma-separated encoding

It's a bit messy.

Instead, switch to something simpler: use a json-encoded struct
containing exactly the information we need, in the form we use it.

Our wireguard-go fork still adds the public key to the
address when it passes it to ParseEndpoint, but now the code
compensating for that is just a couple of simple, well-commented lines.
Once this commit is in, we can remove that part of the fork
and remove the compensating code.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 98cae48e70 wgengine/wglog: optimize wireguardGoString
The new code is ugly, but much faster and leaner.

name        old time/op    new time/op    delta
SetPeers-8    7.81µs ± 1%    3.59µs ± 1%  -54.04%  (p=0.000 n=9+10)

name        old alloc/op   new alloc/op   delta
SetPeers-8    7.68kB ± 0%    2.53kB ± 0%  -67.08%  (p=0.000 n=10+10)

name        old allocs/op  new allocs/op  delta
SetPeers-8       237 ± 0%        99 ± 0%  -58.23%  (p=0.000 n=10+10)

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder 9356912053 wgengine/wglog: add BenchmarkSetPeer
Because it showed up on hello profiles.

Cycle through some moderate-sized sets of peers.
This should cover the "small tweaks to netmap"
and the "up/down cycle" cases.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Brad Fitzpatrick 36a26e6a71 internal/deephash: rename from deepprint
Yes, it printed, but that was an implementation detail for hashing.

And coming optimization will make it print even less.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Josh Bleecher Snyder 773fcfd007 Revert "wgengine/bench: skip flaky test"
This reverts commit d707e2f7e5.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder 68911f6778 wgengine/bench: ignore "engine closing" errors
On benchmark completion, we shut down the wgengine.
If we happen to poll for status during shutdown,
we get an "engine closing" error.
It doesn't hurt anything; ignore it.

Fixes tailscale/corp#1776

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Brad Fitzpatrick d707e2f7e5 wgengine/bench: skip flaky test
Updates tailscale/corp#1776

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Josh Bleecher Snyder 8d2a90529e wgengine/bench: hold lock in TrafficGen.GotPacket while calling first packet callback
Without any synchronization here, the "first packet" callback can
be delayed indefinitely, while other work continues.
Since the callback starts the benchmark timer, this could skew results.
Worse, if the benchmark manages to complete before the benchmark timer begins,
it'll cause a data race with the benchmark shutdown performed by package testing.
That is what is reported in #1881.

This is a bit unfortunate, in that it means that users of TrafficGen have
to be careful to keep this callback speedy and lightweight and to avoid deadlocks.

Fixes #1881

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder a72fb7ac0b wgengine/bench: handle multiple Engine status callbacks
It is possible to get multiple status callbacks from an Engine.
We need to wait for at least one from each Engine.
Without limiting to one per Engine,
wait.Wait can exit early or can panic due to a negative counter.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder 6618e82ba2 wgengine/bench: close Engines on benchmark completion
This reduces the speed with which these benchmarks exhaust their supply fds.
Not to zero unfortunately, but it's still helpful when doing long runs.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder ddd85b9d91 wgengine/magicsock: rename discoEndpoint.wgEndpointHostPort to wgEndpoint
Fields rename only.

Part of the general effort to make our code agnostic about endpoint formatting.
It's just a name, but it will soon be a misleading one; be more generic.
Do this as a separate commit because it generates a lot of whitespace changes.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder e0bd3cc70c wgengine/magicsock: use netaddr.MustParseIPPrefix
Delete our bespoke helper.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder bc68e22c5b all: s/CreateEndpoint/ParseEndpoint/ in docs
Upstream wireguard-go renamed the interface method
from CreateEndpoint to ParseEndpoint.
I missed some comments. Fix them.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder 9bce1b7fc1 wgengine/wgcfg: make device test endpoint-format-agnostic
By using conn.NewDefaultBind, this test requires that our endpoints
be comprehensible to wireguard-go. Instead, use a no-op bind that
treats endpoints as opaque strings.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder 73ad1f804b wgengine/wgcfg: use autogenerated Clone methods
Delete the manually written ones named Copy.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder a0dacba877 wgengine/magicsock: simplify legacy endpoint DstToString
Legacy endpoints (addrSet) currently reconstruct their dst string when requested.

Instead, store the dst string we were given to begin with.
In addition to being simpler and cheaper, this makes less code
aware of how to interpret endpoint strings.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder 777c816b34 wgengine/wgcfg: return better errors from DeviceConfig, ReconfigDevice
Prefer the error from the actual wireguard-go device method call,
not {To,From}UAPI, as those tend to be less interesting I/O errors.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder 1f6c4ba7c3 wgengine/wgcfg: prevent ReconfigDevice from hanging on error
When wireguard-go's UAPI interface fails with an error, ReconfigDevice hangs.
Fix that by buffering the channel and closing the writer after the call.
The code now matches the corresponding code in DeviceConfig, where I got it right.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Josh Bleecher Snyder ed63a041bf wgengine/userspace: delete HandshakeDone
It is unused, and has been since early Feb 2021 (Tailscale 1.6).
We can't get delete the DeviceOptions entirely yet;
first #1831 and #1839 need to go in, along with some wireguard-go changes.
Deleting this chunk of code now will make the later commits more clearly correct.

Pingers can now go too.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
Brad Fitzpatrick b8fb8264a5 wgengine/netstack: avoid delivering incoming packets to both netstack + host
The earlier eb06ec172f fixed
the flaky SSH issue (tailscale/corp#1725) by making sure that packets
addressed to Tailscale IPs in hybrid netstack mode weren't delivered
to netstack, but another issue remained:

All traffic handled by netstack was also potentially being handled by
the host networking stack, as the filter hook returned "Accept", which
made it keep processing. This could lead to various random racey chaos
as a function of OS/firewalls/routes/etc.

Instead, once we inject into netstack, stop our caller's packet
processing.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Brad Fitzpatrick 1a1123d461 wgengine: fix pendopen debug to not track SYN+ACKs, show Node.Online state
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Brad Fitzpatrick eb06ec172f wgengine/netstack: don't pass non-subnet traffic to netstack in hybrid mode
Fixes tailscale/corp#1725

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Brad Fitzpatrick 7629cd6120 net/tsaddr: add NewContainsIPFunc (move from wgengine)
I want to use this from netstack but it's not exported.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Josh Bleecher Snyder 47ebd1e9a2 wgengine/router: use net.IP.Equal instead of bytes.Equal to compare IPs
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder f91c2dfaca wgengine/router: remove unused field
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 9360f36ebd all: use lower-case letters at the start of error message
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 64047815b0 wgenengine/magicsock: delete cursed tests
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 59026a291d wgengine/wglog: improve wireguard-go logging rate limiting
Prior to wireguard-go using printf-style logging,
all wireguard-go logging occurred using format string "%s".
We fixed that but continued to use %s when we rewrote
peer identifiers into Tailscale style.

This commit removes that %sl, which makes rate limiting work correctly.
As a happy side-benefit, it should generate less garbage.

Instead of replacing all wireguard-go peer identifiers
that might occur anywhere in a fully formatted log string,
assume that they only come from args.
Check all args for things that look like *device.Peers
and replace them with appropriately reformatted strings.

There is a variety of ways that this could go wrong
(unusual format verbs or modifiers, peer identifiers
occurring as part of a larger printed object, future API changes),
but none of them occur now, are likely to be added,
or would be hard to work around if they did.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 1f94d43b50 wgengine/wglog: delay formatting
The "stop phrases" we use all occur in wireguard-go in the format string.
We can avoid doing a bunch of fmt.Sprintf work when they appear.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 20e04418ff net/dns: add GOOS build tags
Fixes #1786

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 7ee891f5fd all: delete wgcfg.Key and wgcfg.PrivateKey
For historical reasons, we ended up with two near-duplicate
copies of curve25519 key types, one in the wireguard-go module
(wgcfg) and one in the tailscale module (types/wgkey).
Then we moved wgcfg to the tailscale module.
We can now remove the wgcfg key type in favor of wgkey.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 9d542e08e2 wgengine/magicsock: always run ReceiveIPv6
One of the consequences of the 	bind refactoring in 6f23087175
is that attempting to bind an IPv6 socket will always
result in c.pconn6.pconn being non-nil.
If the bind fails, it'll be set to a placeholder packet conn
that blocks forever.

As a result, we can always run ReceiveIPv6 and health check it.
This removes IPv4/IPv6 asymmetry and also will allow health checks
to detect any IPv6 receive func failures.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder fe50ded95c health: track whether we have a functional udp4 bind
Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 7dc7078d96 wgengine/magicsock: use netaddr.IP in listenPacket
It must be an IP address; enforce that at the type level.

Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 3c543c103a wgengine/magicsock: unify initial bind and rebind
We had two separate code paths for the initial UDP listener bind
and any subsequent rebinds.

IPv6 got left out of the rebind code.
Rather than duplicate it there, unify the two code paths.
Then improve the resulting code:

* Rebind had nested listen attempts to try the user-specified port first,
  and then fall back to :0 if that failed. Convert that into a loop.
* Initial bind tried only the user-specified port.
  Rebind tried the user-specified port and 0.
  But there are actually three ports of interest:
  The one the user specified, the most recent port in use, and 0.
  We now try all three in order, as appropriate.
* In the extremely rare case in which binding to port 0 fails,
  use a dummy net.PacketConn whose reads block until close.
  This will keep the wireguard-go receive func goroutine alive.

As a pleasant side-effect of this, if we decide that
we need to resuscitate #1796, it will now be much easier.

Fixes #1799

Co-authored-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 8fb66e20a4 wgengine/magicsock: remove DefaultPort const
Assume it'll stay at 0 forever, so hard-code it
and delete code conditional on it being non-0.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder a8f61969b9 wgengine/magicsock: remove context arg from listenPacket
It was set to context.Background by all callers, for the same reasons.
Set it locally instead, to simplify call sites.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Brad Fitzpatrick bb2141e0cf wgengine: periodically poll engine status for logging side effect
Fixes tailscale/corp#1560

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Brad Fitzpatrick 3c9dea85e6 wgengine: update a log line from 'weird' to conventional 'unexpected'
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Josh Bleecher Snyder 744de615f1 health, wgenegine: fix receive func health checks for the fourth time
The old implementation knew too much about how wireguard-go worked.
As a result, it missed genuine problems that occurred due to unrelated bugs.

This fourth attempt to fix the health checks takes a black box approach.
A receive func is healthy if one (or both) of these conditions holds:

* It is currently running and blocked.
* It has been executed recently.

The second condition is required because receive functions
are not continuously executing. wireguard-go calls them and then
processes their results before calling them again.

There is a theoretical false positive if wireguard-go go takes
longer than one minute to process the results of a receive func execution.
If that happens, we have other problems.

Updates #1790

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 0d4c8cb2e1 health: delete ReceiveFunc health checks
They were not doing their job.
They need yet another conceptual re-think.
Start by clearing the decks.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Josh Bleecher Snyder 99705aa6b7 net/tstun: split TUN events channel into up/down and MTU
We had a long-standing bug in which our TUN events channel
was being received from simultaneously in two places.

The first is wireguard-go.

At wgengine/userspace.go:366, we pass e.tundev to wireguard-go,
which starts a goroutine (RoutineTUNEventReader)
that receives from that channel and uses events to adjust the MTU
and bring the device up/down.

At wgengine/userspace.go:374, we launch a goroutine that
receives from e.tundev, logs MTU changes, and triggers
state updates when up/down changes occur.

Events were getting delivered haphazardly between the two of them.

We don't really want wireguard-go to receive the up/down events;
we control the state of the device explicitly by calling device.Up.
And the userspace.go loop MTU logging duplicates logging that
wireguard-go does when it received MTU updates.

So this change splits the single TUN events channel into up/down
and other (aka MTU), and sends them to the parties that ought
to receive them.

I'm actually a bit surprised that this hasn't caused more visible trouble.
If a down event went to wireguard-go but the subsequent up event
went to userspace.go, we could end up with the wireguard-go device disappearing.

I believe that this may also (somewhat accidentally) be a fix for #1790.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years ago
Avery Pennarun a7fe1d7c46 wgengine/bench: improved rate selection.
The old decay-based one took a while to converge. This new one (based
very loosely on TCP BBR) seems to converge quickly on what seems to be
the best speed.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
4 years ago