Commit Graph

9927 Commits (16587746ed5446247d44dd0c50cec36cf61a0c80)
 

Author SHA1 Message Date
James 'zofrex' Sanderson 2d1014ead1
ipn/ipnlocal: fix data race on captiveCtx in enterStateLockedOnEntry (#17495)
Updates #17491

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2 months ago
Tom Meadows 0586d5d40d
k8s-operator/sessionrecording: gives the connection to the recorder from the hijacker a dedicated context (#17403)
The hijacker on k8s-proxy's reverse proxy is used to stream recordings
to tsrecorder as they pass through the proxy to the kubernetes api
server. The connection to the recorder was using the client's
(e.g., kubectl) context, rather than a dedicated one. This was causing
the recording stream to get cut off in scenarios where the client
cancelled the context before streaming could be completed.

By using a dedicated context, we can continue streaming even if the
client cancels the context (for example if the client request
completes).

Fixes #17404

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2 months ago
Tom Meadows cd2a3425cb
cmd/tsrecorder: adds sending api level logging to tsrecorder (#16960)
Updates #17141

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2 months ago
Mike O'Driscoll f25e47cdeb
flake.nix: use tailscale go fork (#17486)
Move our nix flake to use Tailscale's go toolchain instead
of upstream go.

Fixes #17494

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2 months ago
M. J. Fromberger ad6cf2f8f3
util/eventbus: add a function-based subscriber type (#17432)
Originally proposed by @bradfitz in #17413.

In practice, a lot of subscribers have only one event type of interest, or a
small number of mostly independent ones. In that case, the overhead of running
and maintaining a goroutine to select on multiple channels winds up being more
noisy than we'd like for the user of the API.

For this common case, add a new SubscriberFunc[T] type that delivers events to
a callback owned by the subscriber, directly on the goroutine belonging to the
client itself. This frees the consumer from the need to maintain their own
goroutine to pull events from the channel, and to watch for closure of the
subscriber.

Before:

     s := eventbus.Subscribe[T](eventClient)
     go func() {
       for {
          select {
          case <-s.Done():
            return
          case e := <-s.Events():
            doSomethingWith(e)
          }
       }
     }()
     // ...
     s.Close()

After:

     func doSomethingWithT(e T) { ... }
     s := eventbus.SubscribeFunc(eventClient, doSomethingWithT)
     // ...
     s.Close()

Moreover, unless the caller wants to explicitly stop the subscriber separately
from its governing client, it need not capture the SubscriberFunc value at all.

One downside of this approach is that a slow or deadlocked callback could block
client's service routine and thus stall all other subscriptions on that client,
However, this can already happen more broadly if a subscriber fails to service
its delivery channel in a timely manner, it just feeds back more immediately.

Updates #17487

Change-Id: I64592d786005177aa9fd445c263178ed415784d5
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2 months ago
Tom Proctor 98a0ccc18a
cmd/tailscaled: default state encryption off for incompatible args (#17480)
Since #17376, containerboot crashes on startup in k8s because state
encryption is enabled by default without first checking that it's
compatible with the selected state store. Make sure we only default
state encryption to enabled if it's not going to immediately clash with
other bits of tailscaled config.

Updates tailscale/corp#32909

Change-Id: I76c586772750d6da188cc97b647c6e0c1a8734f0

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2 months ago
Brad Fitzpatrick 5c1e26b42f ipn/localapi: dead code eliminate unreachable/useless LocalAPI handlers when disabled
Saves ~94 KB from the min build.

Updates #12614

Change-Id: I3b0b8a47f80b9fd3b1038c2834b60afa55bf02c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Alex Chan a9334576ea ipn/ipnlocal: use named arguments for `mockControl.send()`
Updates #cleanup

Signed-off-by: Alex Chan <alexc@tailscale.com>
2 months ago
Brad Fitzpatrick 232b928974 feature/linkspeed: move cosmetic tstun netlink code out to modular feature
Part of making all netlink monitoring code optional.

Updates #17311 (how I got started down this path)
Updates #12614

Change-Id: Ic80d8a7a44dc261c4b8678b3c2241c3b3778370d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Claus Lensbøl 63f7a400a8
wgengine/{magicsock,userspace,router}: move portupdates to the eventbus (#17423)
Also pull out interface method only needed in Linux.

Instead of having userspace do the call into the router, just let the
router pick up the change itself.

Updates #15160

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2 months ago
James 'zofrex' Sanderson eabc62a9dd
ipn/ipnlocal: don't send LoginFinished unless auth was in progress (#17266)
Before we introduced seamless, the "blocked" state was used to track:

* Whether a login was required for connectivity, and therefore we should
  keep the engine deconfigured until that happened
* Whether authentication was in progress

"blocked" would stop authReconfig from running. We want this when a login is
required: if your key has expired we want to deconfigure the engine and keep
it down, so that you don't keep using exit nodes (which won't work because
your key has expired).

Taking the engine down while auth was in progress was undesirable, so we
don't do that with seamless renewal. However, not entering the "blocked"
state meant that we needed to change the logic for when to send
LoginFinished on the IPN bus after seeing StateAuthenticated from the
controlclient. Initially we changed the "if blocked" check to "if blocked or
seamless is enabled" which was correct in other places.

In this place however, it introduced a bug: we are sending LoginFinished
every time we see StateAuthenticated, which happens even on a down & up, or
a profile switch. This in turn made it harder for UI clients to track when
authentication is complete.

Instead we should only send it out if we were blocked (i.e. seamless is
disabled, or our key expired) or an auth was in progress.

Updates tailscale/corp#31476

Updates tailscale/corp#32645

Fixes #17363

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2 months ago
Brad Fitzpatrick 316afe7d02 util/checkchange: stop using deephash everywhere
Saves 45 KB from the min build, no longer pulling in deephash or
util/hashx, both with unsafe code.

It can actually be more efficient to not use deephash, as you don't
have to walk all bytes of all fields recursively to answer that two
things are not equal. Instead, you can just return false at the first
difference you see. And then with views (as we use ~everywhere
nowadays), the cloning the old value isn't expensive, as it's just a
pointer under the hood.

Updates #12614

Change-Id: I7b08616b8a09b3ade454bb5e0ac5672086fe8aec
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 28b1b4c3c1 cmd/tailscaled: guard some flag work with buildfeatures checks
Updates #12614

Change-Id: Iec6f15d33a6500e7b0b7e8f5c098f7c00334460f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 10cb59fa87 build_dist.sh: keep --extra-small making a usable build, add --min
Historically, and until recently, --extra-small produced a usable build.

When I recently made osrouter be modular in 39e35379d4 (which is
useful in, say, tsnet builds) after also making netstack modular, that
meant --min now lacked both netstack support for routing and system
support for routing, making no way to get packets into
wireguard. That's not a nice default to users.  (we've documented
build_dist.sh in our KB)

Restore --extra-small to making a usable build, and add --min for
benchmarking purposes.

Updates #12614

Change-Id: I649e41e324a36a0ca94953229c9914046b5dc497
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
M. J. Fromberger 0415a56b6c
ipn/ipnlocal: fix another racy test (#17472)
Some of the test cases access fields of the backend that are supposed to be
locked while the test is running, which can trigger the race detector.  I fixed
a few of these in #17411, but I missed these two cases.

Updates #15160
Updates #17192

Change-Id: I45664d5e34320ecdccd2844e0f8b228145aaf603
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2 months ago
Brad Fitzpatrick 059f53e67a feature/condlite/expvar: add expvar stub package when metrics not needed
Saves ~53 KB from the min build.

Updates #12614

Change-Id: I73f9544a9feea06027c6ebdd222d712ada851299
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Jordan Whited 192f8d2804
wgengine/magicsock: add more handleNewServerEndpointRunLoop tests (#17469)
Updates tailscale/corp#32978

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
M. J. Fromberger e0f222b686
appc,ipn/ipnlocal: receive AppConnector updates via the event bus (#17411)
Add subscribers for AppConnector events

Make the RouteAdvertiser interface optional We cannot yet remove it because
the tests still depend on it to verify correctness. We will need to separately
update the test fixtures to remove that dependency.

Publish RouteInfo via the event bus, so we do not need a callback to do that. 
Replace it with a flag that indicates whether to treat the route info the connector 
has as "definitive" for filtering purposes.

Update the tests to simplify the construction of AppConnector values now that a
store callback is no longer required. Also fix a couple of pre-existing racy tests that 
were hidden by not being concurrent in the same way production is.

Updates #15160
Updates #17192

Change-Id: Id39525c0f02184e88feaf0d8a3c05504850e47ee
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2 months ago
James 'zofrex' Sanderson 7407f404d9
ipn/ipnlocal: fix setAuthURL / setWgengineStatus race condition (#17408)
If we received a wg engine status while processing an auth URL, there was a
race condition where the authURL could be reset to "" immediately after we
set it.

To fix this we need to check that we are moving from a non-Running state to
a Running state rather than always resetting the URL when we "move" into a
Running state even if that is the current state.

We also need to make sure that we do not return from stopEngineAndWait until
the engine is stopped: before, we would return as soon as we received any
engine status update, but that might have been an update already in-flight
before we asked the engine to stop. Now we wait until we see an update that
is indicative of a stopped engine, or we see that the engine is unblocked
again, which indicates that the engine stopped and then started again while
we were waiting before we checked the state.

Updates #17388

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Co-authored-by: Nick Khyl <nickk@tailscale.com>
2 months ago
Brad Fitzpatrick d816454a88 feature/featuretags: make usermetrics modular
Saves ~102 KB from the min build.

Updates #12614

Change-Id: Ie1d4f439321267b9f98046593cb289ee3c4d6249
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
License Updater ea8e991d69 licenses: update license notices
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
2 months ago
Brad Fitzpatrick 525f9921fe cmd/testwrapper/flakytest: use t.Attr annotation on flaky tests
Updates #17460

Change-Id: I7381e9a6dd73514c73deb6b863749eef1a87efdc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 541a4ed5b4 all: use buildfeatures consts in a few more places
Saves ~25 KB.

Updates #12614

Change-Id: I7b976e57819a0d2692824d779c8cc98033df0d30
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Jordan Whited 44e1d735c3
tailcfg: bump CapVer for magicsock deadlock fix (#17450)
The fix that was applied in e44e28efcd.

Updates tailscale/corp#32978

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Alex Chan 6db8957744 tstest/integration: mark TestPeerRelayPing as flaky
Updates #17251

Signed-off-by: Alex Chan <alexc@tailscale.com>
2 months ago
Brad Fitzpatrick f208bf8cb1 types/lazy: document difference from sync.OnceValue
Updates #8419
Updates github.com/golang#62202

Change-Id: I0c082c4258fb7a95a17054f270dc32019bcc7581
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick cf520a3371 feature/featuretags: add LazyWG modular feature
Due to iOS memory limitations in 2020 (see
https://tailscale.com/blog/go-linker, etc) and wireguard-go using
multiple goroutines per peer, commit 16a9cfe2f4 introduced some
convoluted pathsways through Tailscale to look at packets before
they're delivered to wireguard-go and lazily reconfigure wireguard on
the fly before delivering a packet, only telling wireguard about peers
that are active.

We eventually want to remove that code and integrate wireguard-go's
configuration with Tailscale's existing netmap tracking.

To make it easier to find that code later, this makes it modular. It
saves 12 KB (of disk) to turn it off (at the expense of lots of RAM),
but that's not really the point. The point is rather making it obvious
(via the new constants) where this code even is.

Updates #12614

Change-Id: I113b040f3e35f7d861c457eaa710d35f47cee1cb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
kscooo f80c7e7c23 net/wsconn: clarify package comment
Explain that this file stays forked from coder/websocket until we can
depend on an upstream release for the helper.

Updates #cleanup

Signed-off-by: kscooo <kscowork@gmail.com>
2 months ago
Brad Fitzpatrick 6820ec5bbb wgengine: stop importing flowtrack when unused
Updates #12614

Change-Id: I42b5c4d623d356af4bee5bbdabaaf0f6822f2bf4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Jordan Whited e44e28efcd
wgengine/magicsock: fix relayManager deadlock (#17449)
Updates tailscale/corp#32978

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Jordan Whited 3aa8b6d683
wgengine/magicsock: remove misleading unexpected log message (#17445)
Switching to a Geneve-encapsulated (peer relay) path in
endpoint.handlePongConnLocked is expected around port rebinds, which end
up clearing endpoint.bestAddr.

Fixes tailscale/corp#33036

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2 months ago
Brad Fitzpatrick 3c7e351671 net/connstats: make it modular (omittable)
Saves only 12 KB, but notably removes some deps on packages that future
changes can then eliminate entirely.

Updates #12614

Change-Id: Ibf830d3ee08f621d0a2011b1d4cd175427ef50df
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 2e381557b8 feature/c2n: move answerC2N code + deps out of control/controlclient
c2n was already a conditional feature, but it didn't have a
feature/c2n directory before (rather, it was using consts + DCE). This
adds it, and moves some code, which removes the httprec dependency.

Also, remove some unnecessary code from our httprec fork.

Updates #12614

Change-Id: I2fbe538e09794c517038e35a694a363312c426a2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick db65f3fcf8 ipn/ipnlocal: use buildfeature consts in a few more places
Updates #12614

Change-Id: I561d434d9829172a3d7f6933399237924ff80490
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 223ced84b5 feature/ace: make ACE modular
Updates #12614

Change-Id: Iaee75d8831c4ba5c9705d7877bb78044424c6da1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 141eb64d3f wgengine/router/osrouter: fix data race in magicsock port update callback
As found by @cmol in #17423.

Updates #17423

Change-Id: I1492501f74ca7b57a8c5278ea6cb87a56a4086b9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 447cbdd1d0 health: make it omittable
Saves 86 KB.

And stop depending on expvar and usermetrics when disabled,
in prep to removing all the expvar/metrics/tsweb stuff.

Updates #12614

Change-Id: I35d2479ddd1d39b615bab32b1fa940ae8cbf9b11
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Simon Law 9c3aec58ba
ipn/ipnlocal: remove junk from suggestExitNodeUsingTrafficSteering (#17436)
This patch removes some code that didn’t get removed before merging
the changes in #16580.

Updates #cleanup
Updates #16551

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2 months ago
Brad Fitzpatrick f42be719de all: use buildfeature constants in a few more places
Saves 21 KB.

Updates #12614

Change-Id: I0cd3e735937b0f5c0fcc9f09a24476b1c4ac9a15
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Alex Chan 59a39841c3 tstest/integration: mark TestClientSideJailing as flaky
Updates #17419

Signed-off-by: Alex Chan <alexc@tailscale.com>
2 months ago
Tom Meadows 8d4ea55cc1
cmd/k8s-proxy: switching to using ipn/store/kubestore (#17402)
kubestore init function has now been moved to a more explicit path of
ipn/store/kubestore meaning we can now avoid the generic import of
feature/condregister.

Updates #12614

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2 months ago
Alex Chan 304dabce17 ipn/ipnauth: fix a null pointer panic in GetConnIdentity
When running integration tests on macOS, we get a panic from a nil
pointer dereference when calling `ci.creds.PID()`.

This panic occurs because the `ci.creds != nil` check is insufficient
after a recent refactoring (c45f881) that changed `ci.creds` from a
pointer to the `PeerCreds` interface. Now `ci.creds` always compares as
non-nil, so we enter this block even when the underlying value is nil.

The integration tests fail on macOS when `peercred.Get()` returns the
error `unix.GetsockoptInt: socket is not connected`. This error isn't
new, and the previous code was ignoring it correctly.

Since we trust that `peercred` returns either a usable value or an error,
checking for a nil error is a sufficient and correct gate to prevent the
method call and avoid the panic.

Fixes #17421

Signed-off-by: Alex Chan <alexc@tailscale.com>
2 months ago
Brad Fitzpatrick 206d98e84b control/controlclient: restore aggressive Direct.Close teardown
In the earlier http2 package migration (1d93bdce20, #17394) I had
removed Direct.Close's tracking of the connPool, thinking it wasn't
necessary.

Some tests (in another repo) are strict and like it to tear down the
world and wait, to check for leaked goroutines. And they caught this
letting some goroutines idle past Close, even if they'd eventually
close down on their own.

This restores the connPool accounting and the aggressife close.

Updates #17305
Updates #17394

Change-Id: I5fed283a179ff7c3e2be104836bbe58b05130cc7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Simon Law cd523eae52
ipn/ipnlocal: introduce the concept of client-side-reachability (#17367)
The control plane will sometimes determine that a node is not online,
while the node is still able to connect to its peers. This patch
doesn’t solve this problem, but it does mitigate it.

This PR introduces the `client-side-reachability` node attribute that
switches the node to completely ignore the online signal from control.

In the future, the client itself should collect reachability data from
active Wireguard flows and Tailscale pings.

Updates #17366
Updates tailscale/corp#30379
Updates tailscale/corp#32686

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2 months ago
Brad Fitzpatrick 24e38eb729 control/controlclient,health,ipn/ipnlocal,health: fix deadlock by deleting health reporting
A recent change (009d702adf) introduced a deadlock where the
/machine/update-health network request to report the client's health
status update to the control plane was moved to being synchronous
within the eventbus's pump machinery.

I started to instead make the health reporting be async, but then we
realized in the three years since we added that, it's barely been used
and doesn't pay for itself, for how many HTTP requests it makes.

Instead, delete it all and replace it with a c2n handler, which
provides much more helpful information.

Fixes tailscale/corp#32952

Change-Id: I9e8a5458269ebfdda1c752d7bbb8af2780d71b04
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick a208cb9fd5 feature/featuretags: add features for c2n, peerapi, advertise/use routes/exit nodes
Saves 262 KB so far. I'm sure I missed some places, but shotizam says
these were the low hanging fruit.

Updates #12614

Change-Id: Ia31c01b454f627e6d0470229aae4e19d615e45e3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 2cd518a8b6 control/controlclient: optimize zstd decode of KeepAlive messages
Maybe it matters? At least globally across all nodes?

Fixes #17343

Change-Id: I3f61758ea37de527e16602ec1a6e453d913b3195
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Brad Fitzpatrick 3ae7a351b4 feature/featuretags: make clientmetrics optional
Saves 57 KB

Updates #12614

Change-Id: If7eebec12b3cb30ae6264171d36a258c04b05a70
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
M. J. Fromberger 127a967207
appc,*: publish events for route updates and storage (#17392)
Add and wire up event publishers for these two event types in the AppConnector.
Nothing currently subscribes to them, so this is harmless. Subscribers for
these events will be added in a near-future commit.

As part of this, move the appc.RouteInfo type to the types/appctype package.
It does not contain any package-specific details from appc. Beside it, add
appctype.RouteUpdate to carry route update event state, likewise not specific
to appc.  Update all usage of the appc.* types throughout to use appctype.*
instead, and update depaware files to reflect these changes.

Add a Close method to the AppConnector to make sure the client gets cleaned up
when the connector is dropped (we re-create connectors).

Update the unit tests in the appc package to also check the events published
alongside calls to the RouteAdvertiser.

For now the tests still rely on the RouteAdvertiser for correctness; this is OK
for now as the two methods are always performed together.  In the near future,
we need to rework the tests so not require that, but that will require building
some more test fixtures that we can handle separately.

Updates #15160
Updates #17192

Change-Id: I184670ba2fb920e0d2cb2be7c6816259bca77afe
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2 months ago
M. J. Fromberger 3c32f87624
feature/relayserver: use eventbus.Monitor to simplify lifecycle management (#17234)
Instead of using separate channels to manage the lifecycle of the eventbus
client, use the recently-added eventbus.Monitor, which handles signaling the
processing loop to stop and waiting for it to complete.  This allows us to
simplify some of the setup and cleanup code in the relay server.

Updates #15160

Change-Id: Ia1a47ce2e5a31bc8f546dca4c56c3141a40d67af
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2 months ago