Commit Graph

1341 Commits (alexc/better-localbackend-logging)

Author SHA1 Message Date
Alex Chan 24d56934a0 log if/when we paused
Change-Id: Ife78dcb996cc0dde3b3303fa0b816d0893a3a7e1
2 weeks ago
Andrew Lytvynov c679aaba32
cmd/tailscaled,ipn: show a health warning when state store fails to open (#17883)
With the introduction of node sealing, store.New fails in some cases due
to the TPM device being reset or unavailable. Currently it results in
tailscaled crashing at startup, which is not obvious to the user until
they check the logs.

Instead of crashing tailscaled at startup, start with an in-memory store
with a health warning about state initialization and a link to (future)
docs on what to do. When this health message is set, also block any
login attempts to avoid masking the problem with an ephemeral node
registration.

Updates #15830
Updates #17654

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2 weeks ago
Harry Harpham ac74d28190
ipn/ipnlocal: add validations when setting serve config (#17950)
These validations were previously performed in the CLI frontend. There
are two motivations for moving these to the local backend:
1. The backend controls synchronization around the relevant state, so
   only the backend can guarantee many of these validations.
2. Doing these validations in the back-end avoids the need to repeat
   them across every frontend (e.g. the CLI and tsnet).

Updates tailscale/corp#27200

Signed-off-by: Harry Harpham <harry@tailscale.com>
2 weeks ago
Alex Chan 976bf24f5e ipn/ipnlocal: remove the always-true CanSupportNetworkLock()
Now that we support using an in-memory backend for TKA state (#17946),
this function always returns `nil` – we can always support Network Lock.
We don't need it any more.

Plus, clean up a couple of errant TODOs from that PR.

Updates tailscale/corp#33599

Change-Id: Ief93bb9adebb82b9ad1b3e406d1ae9d2fa234877
Signed-off-by: Alex Chan <alexc@tailscale.com>
2 weeks ago
Alex Chan aeda3e8183 ipn/ipnlocal: reduce profileManager boilerplate in network-lock tests
Updates tailscale/corp#33537

Signed-off-by: Alex Chan <alexc@tailscale.com>
3 weeks ago
Alex Chan e1dd9222d4 ipn/ipnlocal, tka: compact TKA state after every sync
Previously a TKA compaction would only run when a node starts, which means a long-running node could use unbounded storage as it accumulates ever-increasing amounts of TKA state. This patch changes TKA so it runs a compaction after every sync.

Updates https://github.com/tailscale/corp/issues/33537

Change-Id: I91df887ea0c5a5b00cb6caced85aeffa2a4b24ee
Signed-off-by: Alex Chan <alexc@tailscale.com>
3 weeks ago
James Tucker c09c95ef67 types/key,wgengine/magicsock,control/controlclient,ipn: add debug disco key rotation
Adds the ability to rotate discovery keys on running clients, needed for
testing upcoming disco key distribution changes.

Introduces key.DiscoKey, an atomic container for a disco private key,
public key, and the public key's ShortString, replacing the prior
separate atomic fields.

magicsock.Conn has a new RotateDiscoKey method, and access to this is
provided via localapi and a CLI debug command.

Note that this implementation is primarily for testing as it stands, and
regular use should likely introduce an additional mechanism that allows
the old key to be used for some time, to provide a seamless key rotation
rather than one that invalidates all sessions.

Updates tailscale/corp#34037

Signed-off-by: James Tucker <james@tailscale.com>
3 weeks ago
Brad Fitzpatrick bd29b189fe types/netmap,*: remove some redundant fields from NetMap
Updates #12639

Change-Id: Ia50b15529bd1c002cdd2c937cdfbe69c06fa2dc8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Brad Fitzpatrick f1cddc6ecf ipn{,/local},cmd/tailscale: add "sync" flag and pref to disable control map poll
For manual (human) testing, this lets the user disable control plane
map polls with "tailscale set --sync=false" (which survives restarts)
and "tailscale set --sync" to restore.

A high severity health warning is shown while this is active.

Updates #12639
Updates #17945

Change-Id: I83668fa5de3b5e5e25444df0815ec2a859153a6d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Alex Chan 1723cb83ed ipn/ipnlocal: use an in-memory TKA store if FS is unavailable
This requires making the internals of LocalBackend a bit more generic,
and implementing the `tka.CompactableChonk` interface for `tka.Mem`.

Signed-off-by: Alex Chan <alexc@tailscale.com>

Updates https://github.com/tailscale/corp/issues/33599
3 weeks ago
Brad Fitzpatrick a5b2f18567 control/controlclient: remove some public API, move to Options & test-only
Includes adding StartPaused, which will be used in a future change to
enable netmap caching testing.

Updates #12639

Change-Id: Iec39915d33b8d75e9b8315b281b1af2f5d13a44a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Brad Fitzpatrick 99b06eac49 syncs: add Mutex/RWMutex alias/wrappers for future mutex debugging
Updates #17852

Change-Id: I477340fb8e40686870e981ade11cd61597c34a20
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Andrew Dunham 3a41c0c585 ipn/ipnlocal: add PROXY protocol support to Funnel/Serve
This adds the --proxy-protocol flag to 'tailscale serve' and
'tailscale funnel', which tells the Tailscale client to prepend a PROXY
protocol[1] header when making connections to the proxied-to backend.

I've verified that this works with our existing funnel servers without
additional work, since they pass along source address information via
PeerAPI already.

Updates #7747

[1]: https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

Change-Id: I647c24d319375c1b33e995555a541b7615d2d203
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
3 weeks ago
Brad Fitzpatrick 653d0738f9 types/netmap: remove PrivateKey from NetworkMap
It's an unnecessary nuisance having it. We go out of our way to redact
it in so many places when we don't even need it there anyway.

Updates #12639

Change-Id: I5fc72e19e9cf36caeb42cf80ba430873f67167c3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
James Tucker a96ef432cf control/controlclient,ipn/ipnlocal: replace State enum with boolean flags
Remove the State enum (StateNew, StateNotAuthenticated, etc.) from
controlclient and replace it with two explicit boolean fields:
- LoginFinished: indicates successful authentication
- Synced: indicates we've received at least one netmap

This makes the state more composable and easier to reason about, as
multiple conditions can be true independently rather than being
encoded in a single enum value.

The State enum was originally intended as the state machine for the
whole client, but that abstraction moved to ipn.Backend long ago.
This change continues moving away from the legacy state machine by
representing state as a combination of independent facts.

Also adds test helpers in ipnlocal that check independent, observable
facts (hasValidNetMap, needsLogin, etc.) rather than relying on
derived state enums, making tests more robust.

Updates #12639

Signed-off-by: James Tucker <james@tailscale.com>
3 weeks ago
Alex Chan 9134440008 various: adds missing apostrophes to comments
Updates #cleanup

Change-Id: I7bf29cc153c3c04e087f9bdb146c3437bed0129a
Signed-off-by: Alex Chan <alexc@tailscale.com>
3 weeks ago
Brad Fitzpatrick 052602752f control/controlclient: make Observer optional
As a baby step towards eventbus-ifying controlclient, make the
Observer optional.

This also means callers that don't care (like this network lock test,
and some tests in other repos) can omit it, rather than passing in a
no-op one.

Updates #12639

Change-Id: Ibd776b45b4425c08db19405bc3172b238e87da4e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
James 'zofrex' Sanderson 124301fbb6
ipn/ipnlocal: log prefs changes and reason in Start (#17876)
Updates tailscale/corp#34238

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
3 weeks ago
Jordan Whited 9e4d1fd87f
feature/relayserver,ipn/ipnlocal,net/udprelay: plumb DERPMap (#17881)
This commit replaces usage of local.Client in net/udprelay with DERPMap
plumbing over the eventbus. This has been a longstanding TODO. This work
was also accelerated by a memory leak in net/http when using
local.Client over long periods of time. So, this commit also addresses
said leak.

Updates #17801

Signed-off-by: Jordan Whited <jordan@tailscale.com>
3 weeks ago
Brad Fitzpatrick 146ea42822 ipn/ipnlocal: remove all the weird locking (LockedOnEntry, UnlockEarly, etc)
Fixes #11649
Updates #16369

Co-authored-by: James Sanderson <jsanderson@tailscale.com>
Change-Id: I63eaa18fe870ddf81d84b949efac4d1b44c3db86
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Brad Fitzpatrick f387b1010e wgengine/wgcfg: remove two unused Config fields
They distracted me in some refactoring. They're set but never used.

Updates #17858

Change-Id: I6ec7d6841ab684a55bccca7b7cbf7da9c782694f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 weeks ago
Jonathan Nobels e8d2f96449
ipn/ipnlocal, net/netns: add node cap to disable netns interface binding on netext Apple clients (#17691)
updates tailscale/corp#31571

It appears that on the latest macOS, iOS and tVOS versions, the work
that netns is doing to bind outgoing connections to the default interface (and all
of the trimmings and workarounds in netmon et al that make that work) are
not needed. The kernel is extension-aware and doing nothing, is the right
thing.  This is, however, not the case for tailscaled (which is not a
special process).

To allow us to test this assertion (and where it might break things), we add a
new node cap that turns this behaviour off only for network-extension equipped clients,
making it possible to turn this off tailnet-wide, without breaking any tailscaled
macos nodes.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
3 weeks ago
Brad Fitzpatrick 4650061326 ipn/ipnlocal: fix state_test data race seen in CI
Unfortunately I closed the tab and lost it in my sea of CI failures
I'm currently fighting.

Updates #cleanup

Change-Id: I4e3a652d57d52b75238f25d104fc1987add64191
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 weeks ago
Brad Fitzpatrick 8ed6bb3198 ipn/ipnlocal: move vipServiceHash etc to serve.go, out of local.go
Updates #12614

Change-Id: I3c16b94fcb997088ff18d5a21355e0279845ed7e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 weeks ago
Brad Fitzpatrick e0e8731130 feature, ipn/ipnlocal: add, use feature.CanSystemdStatus for more DCE
When systemd notification support was omitted from the build, or on
non-Linux systems, we were unnecessarily emitting code and generating
garbage stringifying addresses upon transition to the Running state.

Updates #12614

Change-Id: If713f47351c7922bb70e9da85bf92725b25954b9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 weeks ago
Andrew Lytvynov ae3dff15e4
ipn/ipnlocal: clean up some of the weird locking (#17802)
* lock released early just to call `b.send` when it can call
  `b.sendToLocked` instead
* `UnlockEarly` called to release the lock before trivially fast
  operations, we can wait for a defer there

Updates #11649

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
4 weeks ago
Brad Fitzpatrick de733c5951 tailcfg: kill off rest of HairPinning symbols
It was disabled in May 2024 in #12205 (9eb72bb51).

This removes the unused symbols.

Updates #188
Updates tailscale/corp#19106
Updates tailscale/corp#19116

Change-Id: I5208b7b750b18226ed703532ed58c4ea17195a8e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 weeks ago
Andrew Lytvynov db7dcd516f
Revert "control/controlclient: back out HW key attestation (#17664)" (#17732)
This reverts commit a760cbe33f.

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
1 month ago
Fernando Serboncini d68513b0db
ipn: add support for HTTP Redirects (#17594)
Adds a new Redirect field to HTTPHandler for serving HTTP redirects
from the Tailscale serve config. The redirect URL supports template
variables ${HOST} and ${REQUEST_URI} that are resolved per request.

By default, it redirects using HTTP Status 302 (Found). For another
redirect status, like 301 - Moved Permanently, pass the HTTP status
code followed by ':' on Redirect, like: "301:https://tailscale.com"

Updates #11252
Updates #11330

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
1 month ago
M. J. Fromberger 06b092388e
ipn/ipnlocal: do not stall event processing for appc route updates (#17663)
A follow-up to #17411. Put AppConnector events into a task queue, as they may
take some time to process. Ensure that the queue is stopped at shutdown so that
cleanup will remain orderly.

Because events are delivered on a separate goroutine, slow processing of an
event does not cause an immediate problem; however, a subscriber that blocks
for a long time will push back on the bus as a whole. See
https://godoc.org/tailscale.com/util/eventbus#hdr-Expected_subscriber_behavior
for more discussion.

Updates #17192
Updates #15160

Change-Id: Ib313cc68aec273daf2b1ad79538266c81ef063e3
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
1 month ago
Gesa Stupperich d2e4a20f26 ipn/ipnlocal/serve: error when PeerCaps serialisation fails
Also consolidates variable and header naming and amends the
CLI behavior
* multiple app-caps have to be specified as comma-separated
  list
* simple regex-based validation of app capability names is
  carried out during flag parsing

Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
1 month ago
Gesa Stupperich d6fa899eba ipn/ipnlocal/serve: remove grant header truncation logic
Given that we filter based on the usercaps argument now, truncation
should not be necessary anymore.

Updates tailscale/corp/#28372

Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
1 month ago
Gesa Stupperich 576aacd459 ipn/ipnlocal/serve: add grant headers
Updates tailscale/corp/#28372

Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
1 month ago
Patrick O'Doherty a760cbe33f
control/controlclient: back out HW key attestation (#17664)
Temporarily back out the TPM-based hw attestation code while we debug
Windows exceptions.

Updates tailscale/corp#31269

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
1 month ago
Alex Chan d47c697748 ipn/ipnlocal: skip TKA bootstrap request if Tailnet Lock is unavailable
If you run tailscaled without passing a `--statedir`, Tailnet Lock is
unavailable -- we don't have a folder to store the AUMs in.

This causes a lot of unnecessary requests to bootstrap TKA, because
every time the node receives a NetMap with some TKA state, it tries to
bootstrap, fetches the bootstrap TKA state from the control plane, then
fails with the error:

    TKA sync error: bootstrap: network-lock is not supported in this
    configuration, try setting --statedir

We can't prevent the error, but we can skip the control plane request
that immediately gets dropped on the floor.

In local testing, a new node joining a tailnet caused *three* control
plane requests which were unused.

Updates tailscale/corp#19441

Signed-off-by: Alex Chan <alexc@tailscale.com>
1 month ago
Patrick O'Doherty 36ad24b20f
feature/tpm: check TPM family data for compatibility (#17624)
Check that the TPM we have opened is advertised as a 2.0 family device
before using it for state sealing / hardware attestation.

Updates #17622

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
1 month ago
Alex Chan 2b448f0696 ipn, tka: improve the logging around TKA sync and AUM errors
*   When we do the TKA sync, log whether TKA is enabled and whether
    we want it to be enabled. This would help us see if a node is
    making bootstrap errors.

*   When we fail to look up an AUM locally, log the ID of the AUM
    rather than a generic "file does not exist" error.

    These AUM IDs are cryptographic hashes of the TKA state, which
    itself just contains public keys and signatures. These IDs aren't
    sensitive and logging them is safe.

Signed-off-by: Alex Chan <alexc@tailscale.com>

Updates https://github.com/tailscale/corp/issues/33594
2 months ago
M. J. Fromberger 3dde233cd3
ipn/ipnlocal: use eventbus.SubscribeFunc in LocalBackend (#17524)
This does not change which subscriptions are made, it only swaps them to use
the SubscribeFunc API instead of Subscribe.

Updates #15160
Updates #17487

Change-Id: Id56027836c96942206200567a118f8bcf9c07f64
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2 months ago
Patrick O'Doherty d8a6d0183c
ipn/ipnlocal: strip AttestationKey in redacted prefs view (#17527)
Updates tailscale/corp#31269

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
2 months ago
Patrick O'Doherty e45557afc0
types/persist: add AttestationKey (#17281)
Extend Persist with AttestationKey to record a hardware-backed
attestation key for the node's identity.

Add a flag to tailscaled to allow users to control the use of
hardware-backed keys to bind node identity to individual machines.

Updates tailscale/corp#31269


Change-Id: Idcf40d730a448d85f07f1bebf387f086d4c58be3

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
2 months ago
M. J. Fromberger 109cb50d5f ipn/ipnlocal: use eventbus.SubscribeFunc in expiryManager
Updates #15160
Updates #17487

Change-Id: I8721e3ac1af505630edca7c5cb50695b0aad832a
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2 months ago
James 'zofrex' Sanderson 2d1014ead1
ipn/ipnlocal: fix data race on captiveCtx in enterStateLockedOnEntry (#17495)
Updates #17491

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2 months ago
Brad Fitzpatrick 5c1e26b42f ipn/localapi: dead code eliminate unreachable/useless LocalAPI handlers when disabled
Saves ~94 KB from the min build.

Updates #12614

Change-Id: I3b0b8a47f80b9fd3b1038c2834b60afa55bf02c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Alex Chan a9334576ea ipn/ipnlocal: use named arguments for `mockControl.send()`
Updates #cleanup

Signed-off-by: Alex Chan <alexc@tailscale.com>
2 months ago
James 'zofrex' Sanderson eabc62a9dd
ipn/ipnlocal: don't send LoginFinished unless auth was in progress (#17266)
Before we introduced seamless, the "blocked" state was used to track:

* Whether a login was required for connectivity, and therefore we should
  keep the engine deconfigured until that happened
* Whether authentication was in progress

"blocked" would stop authReconfig from running. We want this when a login is
required: if your key has expired we want to deconfigure the engine and keep
it down, so that you don't keep using exit nodes (which won't work because
your key has expired).

Taking the engine down while auth was in progress was undesirable, so we
don't do that with seamless renewal. However, not entering the "blocked"
state meant that we needed to change the logic for when to send
LoginFinished on the IPN bus after seeing StateAuthenticated from the
controlclient. Initially we changed the "if blocked" check to "if blocked or
seamless is enabled" which was correct in other places.

In this place however, it introduced a bug: we are sending LoginFinished
every time we see StateAuthenticated, which happens even on a down & up, or
a profile switch. This in turn made it harder for UI clients to track when
authentication is complete.

Instead we should only send it out if we were blocked (i.e. seamless is
disabled, or our key expired) or an auth was in progress.

Updates tailscale/corp#31476

Updates tailscale/corp#32645

Fixes #17363

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2 months ago
Brad Fitzpatrick 316afe7d02 util/checkchange: stop using deephash everywhere
Saves 45 KB from the min build, no longer pulling in deephash or
util/hashx, both with unsafe code.

It can actually be more efficient to not use deephash, as you don't
have to walk all bytes of all fields recursively to answer that two
things are not equal. Instead, you can just return false at the first
difference you see. And then with views (as we use ~everywhere
nowadays), the cloning the old value isn't expensive, as it's just a
pointer under the hood.

Updates #12614

Change-Id: I7b08616b8a09b3ade454bb5e0ac5672086fe8aec
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
M. J. Fromberger 0415a56b6c
ipn/ipnlocal: fix another racy test (#17472)
Some of the test cases access fields of the backend that are supposed to be
locked while the test is running, which can trigger the race detector.  I fixed
a few of these in #17411, but I missed these two cases.

Updates #15160
Updates #17192

Change-Id: I45664d5e34320ecdccd2844e0f8b228145aaf603
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2 months ago
M. J. Fromberger e0f222b686
appc,ipn/ipnlocal: receive AppConnector updates via the event bus (#17411)
Add subscribers for AppConnector events

Make the RouteAdvertiser interface optional We cannot yet remove it because
the tests still depend on it to verify correctness. We will need to separately
update the test fixtures to remove that dependency.

Publish RouteInfo via the event bus, so we do not need a callback to do that. 
Replace it with a flag that indicates whether to treat the route info the connector 
has as "definitive" for filtering purposes.

Update the tests to simplify the construction of AppConnector values now that a
store callback is no longer required. Also fix a couple of pre-existing racy tests that 
were hidden by not being concurrent in the same way production is.

Updates #15160
Updates #17192

Change-Id: Id39525c0f02184e88feaf0d8a3c05504850e47ee
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2 months ago
James 'zofrex' Sanderson 7407f404d9
ipn/ipnlocal: fix setAuthURL / setWgengineStatus race condition (#17408)
If we received a wg engine status while processing an auth URL, there was a
race condition where the authURL could be reset to "" immediately after we
set it.

To fix this we need to check that we are moving from a non-Running state to
a Running state rather than always resetting the URL when we "move" into a
Running state even if that is the current state.

We also need to make sure that we do not return from stopEngineAndWait until
the engine is stopped: before, we would return as soon as we received any
engine status update, but that might have been an update already in-flight
before we asked the engine to stop. Now we wait until we see an update that
is indicative of a stopped engine, or we see that the engine is unblocked
again, which indicates that the engine stopped and then started again while
we were waiting before we checked the state.

Updates #17388

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Co-authored-by: Nick Khyl <nickk@tailscale.com>
2 months ago
Brad Fitzpatrick 541a4ed5b4 all: use buildfeatures consts in a few more places
Saves ~25 KB.

Updates #12614

Change-Id: I7b976e57819a0d2692824d779c8cc98033df0d30
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago