This is for a future test scheduler, so it can run potentially flaky
tests separately, doing all the non-flaky ones together in one batch.
Updates tailscale/corp#28679
Change-Id: Ic4a11f9bf394528ef75792fd622f17bc01a4ec8a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When TS_GO_NEXT=1 is set, update/use the
go.toolchain.next.{branch,rev} files instead.
This lets us do test deploys of Go release candidates on some
backends, without affecting all backends.
Updates tailscale/corp#36382
Change-Id: I00dbde87b219b720be5ea142325c4711f101a364
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The Tailscale CLI has some methods to watch the IPN bus for
messages, say, the current netmap (`tailscale debug netmap`).
The Tailscale daemon supports this using a streaming HTTP
response. Sometimes, the client can close its connection
abruptly -- due to an interruption, or in the case of `debug netmap`,
intentionally after consuming one message.
If the server daemon is writing a response as the client closes
its end of the socket, the daemon typically encounters a "broken pipe"
error. The "Watch IPN Bus" handler currently logs such errors after
they're propagated by a JSON encoding/writer helper.
Since the Tailscale CLI nominally closes its socket with the daemon
in this slightly ungraceful way (viz. `debug netmap`), stop logging
these broken pipe errors as far as possible. This will help avoid
confounding users when they scan backend logs.
Updates #18477
Signed-off-by: Amal Bansode <amal@tailscale.com>
This commit is based on part of #17925, reworked as a separate package.
Add a package that can store and load netmap.NetworkMap values in persistent
storage, using a basic columnar representation. This commit includes a default
storage interface based on plain files, but the interface can be implemented
with more structured storage if we want to later.
The tests are set up to require that all the fields of the NetworkMap are
handled, except those explicitly designated as not-cached, and check that a
fully-populated value can round-trip correctly through the cache. Adding or
removing fields, either in the NetworkMap or in the cached representation, will
trigger either build failures (e.g., for type mismatch) or test failures (e.g.,
for representation changes or missing fields). This isn't quite as nice as
automatically updating the representation, which I also prototyped, but is much
simpler to maintain and less code.
This commit does not yet hook up the cache to the backend, that will be a
subsequent change.
Updates #12639
Change-Id: Icb48639e1d61f2aec59904ecd172c73e05ba7bf9
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Someone asked me if we use DNS-over-HTTPS if the system's resolver is an
IP address that supports DoH and there's no global nameserver set (i.e.
no "Override DNS servers" set). I didn't know the answer offhand, and it
took a while for me to figure it out. The answer is yes, in cases where
we take over the system's DNS configuration and read the base config, we
do upgrade any DoH-capable resolver to use DoH. Here's a test that
verifies this behaviour (and hopefully helps as documentation the next
time someone has this question).
Updates #cleanup
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
If conn25 config is sent in the netmap: add split DNS entries to use
appropriately tagged peers' PeerAPI to resolve DNS requests for those
domains.
This will enable future work where we use the peers as connectors for
the configured domains.
Updates tailscale/corp#34252
Signed-off-by: Fran Bull <fran@tailscale.com>
Similarly to allowing link-local multicast in #13661, we should also allow broadcast traffic
on permitted interfaces when the killswitch is enabled due to exit node usage on Windows.
This always includes internal interfaces, such as Hyper-V/WSL2, and also the LAN when
"Allow local network access" is enabled in the client.
Updates #18504
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This file was never truly necessary and has never actually been used in
the history of Tailscale's open source releases.
A Brief History of AUTHORS files
---
The AUTHORS file was a pattern developed at Google, originally for
Chromium, then adopted by Go and a bunch of other projects. The problem
was that Chromium originally had a copyright line only recognizing
Google as the copyright holder. Because Google (and most open source
projects) do not require copyright assignemnt for contributions, each
contributor maintains their copyright. Some large corporate contributors
then tried to add their own name to the copyright line in the LICENSE
file or in file headers. This quickly becomes unwieldy, and puts a
tremendous burden on anyone building on top of Chromium, since the
license requires that they keep all copyright lines intact.
The compromise was to create an AUTHORS file that would list all of the
copyright holders. The LICENSE file and source file headers would then
include that list by reference, listing the copyright holder as "The
Chromium Authors".
This also become cumbersome to simply keep the file up to date with a
high rate of new contributors. Plus it's not always obvious who the
copyright holder is. Sometimes it is the individual making the
contribution, but many times it may be their employer. There is no way
for the proejct maintainer to know.
Eventually, Google changed their policy to no longer recommend trying to
keep the AUTHORS file up to date proactively, and instead to only add to
it when requested: https://opensource.google/docs/releasing/authors.
They are also clear that:
> Adding contributors to the AUTHORS file is entirely within the
> project's discretion and has no implications for copyright ownership.
It was primarily added to appease a small number of large contributors
that insisted that they be recognized as copyright holders (which was
entirely their right to do). But it's not truly necessary, and not even
the most accurate way of identifying contributors and/or copyright
holders.
In practice, we've never added anyone to our AUTHORS file. It only lists
Tailscale, so it's not really serving any purpose. It also causes
confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header
in other open source repos which don't actually have an AUTHORS file, so
it's ambiguous what that means.
Instead, we just acknowledge that the contributors to Tailscale (whoever
they are) are copyright holders for their individual contributions. We
also have the benefit of using the DCO (developercertificate.org) which
provides some additional certification of their right to make the
contribution.
The source file changes were purely mechanical with:
git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g'
Updates #cleanup
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
In order to better manage per-profile data resources on the client, add methods
to the LocalBackend to support creation of per-profile directory structures in
local storage. These methods build on the existing TailscaleVarRoot config, and
have the same limitation (i.e., if no local storage is available, it will
report an error when used).
The immediate motivation is to support netmap caching, but we can also use this
mechanism for other per-profile resources including pending taildrop files and
Tailnet Lock authority caches.
This commit only adds the directory-management plumbing; later commits will
handle migrating taildrop, TKA, etc. to this mechanism, as well as caching
network maps.
Updates #12639
Change-Id: Ia75741955c7bf885e49c1ad99f856f669a754169
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
`dnf config-manager addrepo` will fail if the Tailscale repo is already
installed. Without the --overwrite flag, the installer will error out
instead of succeeding like with dnf3.
Fixes#18491
Signed-off-by: Francois Marier <francois@fmarier.org>
tsnet users can now provide a tun.Device, including any custom
implementation that conforms to the interface.
netstack has a new option CheckLocalTransportEndpoints that when used
alongside a TUN enables netstack listens and dials to correctly capture
traffic associated with those sockets. tsnet with a TUN sets this
option, while all other builds leave this at false to preserve existing
performance.
Updates #18423
Signed-off-by: James Tucker <james@tailscale.com>
Every other listen method on tsnet.Server makes this clarification, so
should ListenService.
Fixestailscale/corp#36207
Signed-off-by: Harry Harpham <harry@tailscale.com>
When we have not yet communicated with a peer, send a
TSMPDiscoAdvertisement to let the peer know of our disco key. This is in
most cases redundant, but will allow us to set up direct connections
when the client cannot access control.
Some parts taken from: #18073
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
New gauge reflects endpoints state via labels:
- open, when both peers are connected and ready to talk, and
- connecting. when at least one peer hasn't connected yet.
Corresponding client metrics are logged as
- udprelay_endpoints_connecting
- udprelay_endpoints_open
Updates tailscale/corp#30820
Change-Id: Idb1baa90a38c97847e14f9b2390093262ad0ea23
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
This commit contains the implementation of multi-tailnet support within the Kubernetes Operator
Each of our custom resources now expose the `spec.tailnet` field. This field is a string that must match the name of an existing `Tailnet` resource. A `Tailnet` resource looks like this:
```yaml
apiVersion: tailscale.com/v1alpha1
kind: Tailnet
metadata:
name: example # This is the name that must be referenced by other resources
spec:
credentials:
secretName: example-oauth
```
Each `Tailnet` references a `Secret` resource that contains a set of oauth credentials. This secret must be created in the same namespace as the operator:
```yaml
apiVersion: v1
kind: Secret
metadata:
name: example-oauth # This is the name that's referenced by the Tailnet resource.
namespace: tailscale
stringData:
client_id: "client-id"
client_secret: "client-secret"
```
When created, the operator performs a basic check that the oauth client has access to all required scopes. This is done using read actions on devices, keys & services. While this doesn't capture a missing "write" permission, it catches completely missing permissions. Once this check passes, the `Tailnet` moves into a ready state and can be referenced. Attempting to use a `Tailnet` in a non-ready state will stall the deployment of `Connector`s, `ProxyGroup`s and `Recorder`s until the `Tailnet` becomes ready.
The `spec.tailnet` field informs the operator that a `Connector`, `ProxyGroup`, or `Recorder` must be given an auth key generated using the specified oauth client. For backwards compatibility, the set of credentials the operator is configured with are considered the default. That is, where `spec.tailnet` is not set, the resource will be deployed in the same tailnet as the operator.
Updates https://github.com/tailscale/corp/issues/34561
fixestailscale/corp#27182
tailscale version --json now includes an osVariant field that will report
one of macsys, appstore or darwin. We can extend this to other
platforms where tailscaled can have multiple personalities.
This also adds the concept of a platform-specific callback for querying
an explicit application identifier. On Apple, we can use
CFBundleGetIdentifier(mainBundle) to get the bundle identifier via cgo.
This removes all the ambiguity and lets us remove other less direct
methods (like env vars, locations, etc).
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
Polls IMDS (currently only AWS) for extra IPs to advertise as udprelay.
Updates #17796
Change-Id: Iaaa899ef4575dc23b09a5b713ce6693f6a6a6964
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
* k8s-operator,kube: removing enableSessionRecordings option. It seems
like it is going to create a confusing user experience and it's going to
be a very niche use case, so we have decided to defer this for now.
Updates tailscale/corp#35796
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
* k8s-operator: adding metric for env var deprecation
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
---------
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
net/portmapper: Stop replacing the internal port with the upnp external port
This causes the UPnP mapping to break in the next recreation of the
mapping.
Fixes#18348
Signed-off-by: Eduardo Sorribas <eduardo@sorribas.org>
This change allows tsnet nodes to act as Service hosts by adding a new
function, tsnet.Server.ListenService. Invoking this function will
advertise the node as a host for the Service and create a listener to
receive traffic for the Service.
Fixes#17697Fixestailscale/corp#27200
Signed-off-by: Harry Harpham <harry@tailscale.com>
This change adds API to ipn.LocalBackend to retrieve the ETag when
querying for the current serve config. This allows consumers of
ipn.LocalBackend.SetServeConfig to utilize the concurrency control
offered by ETags. Previous to this change, utilizing serve config ETags
required copying the local backend's internal ETag calcuation.
The local API server was previously copying the local backend's ETag
calculation as described above. With this change, the local API server
now uses the new ETag retrieval function instead. Serve config ETags are
therefore now opaque to clients, in line with best practices.
Fixestailscale/corp#35857
Signed-off-by: Harry Harpham <harry@tailscale.com>
fixestailscale/tailscale#18418
Both Serve and PeerAPI broke when we moved the TailscaleInterfaceName
into State, which is updated asynchronously and may not be
available when we configure the listeners.
This extracts the explicit interface name property from netmon.State
and adds as a static struct with getters that have proper error
handling.
The bug is only found in sandboxed Darwin clients, where we
need to know the Tailscale interface details in order to set up the
listeners correctly (they must bind to our interface explicitly to escape
the network sandboxing that is applied by NECP).
Currently set only sandboxed macOS and Plan9 set this but it will
also be useful on Windows to simplify interface filtering in netns.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
Policy editors, such as gpedit.msc and gpme.msc, rely on both the presence and the value of the
registry value to determine whether a policy is enabled. Unless an enabledValue is specified
explicitly, it defaults to REG_DWORD 1.
Therefore, we cannot rely on the same registry value to track the policy configuration state when
it is already used by a policy option, such as a dropdown. Otherwise, while the policy setting
will be written and function correctly, it will appear as Not Configured in the policy editor
due to the value mismatch (for example, REG_SZ "always" vs REG_DWORD 1).
In this PR, we update the DNSRegistration policy setting to use the DNSRegistrationConfigured
registry value for tracking. This change has no effect on the client side and exists solely to
satisfy ADMX and policy editor requirements.
Updates #14917
Signed-off-by: Nick Khyl <nickk@tailscale.com>
gocross-wrapper.ps1 is written to use the version of tar that ships with
Windows; we want to avoid conflicts with any other tar on the PATH, such
ones installed by MSYS and/or Cygwin.
Updates https://github.com/tailscale/corp/issues/29940
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Recently, the golangci-lint workflow has been taking longer and longer
to complete, causing it to timeout after the default of 5 minutes.
Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: context deadline exceeded
Timeout exceeded: try increasing it by passing --timeout option
Although PR #18398 enabled the Go module cache, bootstrapping with a
cold cache still takes too long.
This PR doubles the default 5 minute timeout for golangci-lint to 10
minutes so that golangci-lint can finish downloading all of its
dependencies.
Note that this doesn’t affect the 5 minute timeout configured in
.golangci.yml, since running golangci-lint on your local instance
should still be plenty fast.
Fixes#18366
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Allow for optionally specifying an audience for containerboot. This is
passed to tailscale up to allow for containerboot to use automatic ID
token generation for authentication.
Updates https://github.com/tailscale/corp/issues/34430
Signed-off-by: Mario Minardi <mario@tailscale.com>
Allow for optionally specifiying an audience for tsnet. This is passed
to the underlying identity federation logic to allow for tsnet auth to
use automatic ID token generation for authentication.
Updates https://github.com/tailscale/corp/issues/33316
Signed-off-by: Mario Minardi <mario@tailscale.com>
If local tailscale/tailscale checkout is not available,
pulll cigocacher remotely.
Fall back to ./tool/go if no other Go installation
is present.
Updates tailscale/corp#32493
Signed-off-by: Irbe Krumina <irbekrm@gmail.com>
Adds the ability to detect what provider the client is running on and tries fetch the ID token to use with Workload Identity.
Updates https://github.com/tailscale/corp/issues/33316
Signed-off-by: Danni Popova <danni@tailscale.com>
Recently, the golangci-lint workflow has been taking longer and longer
to complete, causing it to timeout after the default of 5 minutes.
Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: context deadline exceeded
Timeout exceeded: try increasing it by passing --timeout option
This PR upgrades actions/setup-go to version 6, the latest, and
enables caching for Go modules and build outputs. This should speed up
linting because most packages won’t have to be downloaded over and
over again.
Fixes#18366
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Fixes a bug where, for kube HA proxies, TLS certs for the replica
responsible for cert issuance where loaded in memory on startup,
although the in-memory store was not updated after renewal (to
avoid failing re-issuance for re-created Ingresses).
Now the 'write' replica always reads certs from the kube Secret.
Updates tailscale/tailscale#18394
Signed-off-by: Irbe Krumina <irbekrm@gmail.com>
Previously the funnel listener would leave artifacts in the serve
config. This caused weird out-of-sync effects like the admin panel
showing that funnel was enabled for a node, but the node rejecting
packets because the listener was closed.
This change resolves these synchronization issues by ensuring that
funnel listeners clean up the serve config when closed.
See also:
e109cf9fdd
Updates #cleanup
Signed-off-by: Harry Harpham <harry@tailscale.com>
Prior to this change, we were resetting the tsnet's serve config every
time tsnet.Server.Up was run. This is important to do on startup, to
prevent messy interactions with stale configuration when the code has
changed.
However, Up is frequently run as a just-in-case step (for example, by
Server.ListenTLS/ListenFunnel and possibly by consumers of tsnet). When
the serve config is reset on each of these calls to Up, this creates
situations in which the serve config disappears unexpectedly. The
solution is to reset the serve config only on the first call to Up.
Fixes#8800
Updates tailscale/corp#27200
Signed-off-by: Harry Harpham <harry@tailscale.com>
Add support for authenticating the gitops-pusher using workload identity
federation.
Updates https://github.com/tailscale/corp/issues/34172
Signed-off-by: Mario Minardi <mario@tailscale.com>
QR codes are used by `tailscale up --qr` to provide an easy way to
open a web-page without transcribing a difficult URI. However, there’s
no need for this feature if the client will never be called
interactively. So this PR adds the `ts_omit_qrcodes` build tag.
Updates #18182
Signed-off-by: Simon Law <sfllaw@tailscale.com>
It's not worth adding the v2 client just for these e2e tests. Remove
that dependency for now to keep a clear separation, but we should revive
the v2 client version if we ever decide to take that dependency for the
tailscale/tailscale repo as a whole.
Updates tailscale/corp#32085
Change-Id: Ic51ce233d5f14ce2d25f31a6c4bb9cf545057dd0
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
* cmd/k8s-operator/e2e: run self-contained e2e tests with devcontrol
Adds orchestration for more of the e2e testing setup requirements to
make it easier to run them in CI, but also run them locally in a way
that's consistent with CI. Requires running devcontrol, but otherwise
supports creating all the scaffolding required to exercise the operator
and proxies.
Updates tailscale/corp#32085
Change-Id: Ia7bff38af3801fd141ad17452aa5a68b7e724ca6
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
* cmd/k8s-operator/e2e: being more specific on tmp dir cleanup
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
---------
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
Raw Linux consoles support UTF-8, but we cannot assume that all UTF-8
characters are available. The default Fixed and Terminus fonts don’t
contain half-block characters (`▀` and `▄`), but do contain the
full-block character (`█`).
Sometimes, Linux doesn’t have a framebuffer, so it falls back to VGA.
When this happens, the full-block character could be anywhere in
extended ASCII block, because we don’t know which code page is active.
This PR introduces `--qr-format=auto` which tries to heuristically
detect when Tailscale is printing to a raw Linux console, whether
UTF-8 is enabled, and which block characters have been mapped in the
console font.
If Unicode characters are unavailable, the new `--qr-format=ascii`
formatter uses `#` characters instead of full-block characters.
Fixes#12935
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Moves magicksock.cloudInfo into util/cloudinfo with minimal changes.
Updates #17796
Change-Id: I83f32473b9180074d5cdbf00fa31e5b3f579f189
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
Bump peter-evans/create-pull-request to 8.0.0 to ensure compatibility
with actions/checkout 6.x.
Updates #cleanup
Signed-off-by: Mario Minardi <mario@tailscale.com>
The funnel command is sort of an alias for the serve command. This means
that the subcommands added to serve to support Services appear as
subcommands for funnel as well, despite having no meaning for funnel.
This change removes all such Services-specific subcommands from funnel.
Fixestailscale/corp#34167
Signed-off-by: Harry Harpham <harry@tailscale.com>
Ensure that hardware attestation keys are not added to tailscaled
state stores that are Kubernetes Secrets or AWS SSM as those Tailscale
devices should be able to be recreated on different nodes, for example,
when moving Pods between nodes.
Updates tailscale/tailscale#18302
Signed-off-by: Irbe Krumina <irbekrm@gmail.com>
TPM-based features have been incredibly painful due to the heterogeneous
devices in the wild, and many situations in which the TPM "changes" (is
reset or replaced). All of this leads to a lot of customer issues.
We hoped to iron out all the kinks and get all users to benefit from
state encryption and hardware attestation without manually opting in,
but the long tail of kinks is just too long.
This change disables TPM-based features on Windows and Linux by default.
Node state should get auto-decrypted on update, and old attestation keys
will be removed.
There's also tailscaled-on-macOS, but it won't have a TPM or Keychain
bindings anyway.
Updates #18302
Updates #15830
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Soft-fail on initial unmarshal and try again, ignoring the
AttestationKey. This helps in cases where something about the
attestation key storage (usually a TPM) is messed up. The old key will
be lost, but at least the node can start again.
Updates #18302
Updates #15830
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Send LOGIN audit messages to the kernel audit subsystem on Linux
when users successfully authenticate to Tailscale SSH. This provides
administrators with audit trail integration via auditd or journald,
recording details about both the Tailscale user (whois) and the
mapped local user account.
The implementation uses raw netlink sockets to send AUDIT_USER_LOGIN
messages to the kernel audit subsystem. It requires CAP_AUDIT_WRITE
capability, which is checked at runtime. If the capability is not
present, audit logging is silently skipped.
Audit messages are sent to the kernel (pid 0) and consumed by either
auditd (written to /var/log/audit/audit.log) or journald (available
via journalctl _TRANSPORT=audit), depending on system configuration.
Note: This may result in duplicate messages on a system where
auditd/journald audit logs are enabled and the system has and supports
`login -h`. Sadly Linux login code paths are still an inconsistent wild
west so we accept the potential duplication rather than trying to avoid
it.
Fixes#18332
Signed-off-by: James Tucker <james@tailscale.com>
GCP Certificate Manager requires an email contact on ACME accounts.
Add --acme-email flag that is required for --certmode=gcp and
optional for --certmode=letsencrypt.
Fixes#18277
Signed-off-by: Raj Singh <raj@tailscale.com>
An error returned by net.Listener.Accept() causes the owning http.Server to shut down.
With the deprecation of net.Error.Temporary(), there's no way for the http.Server to test
whether the returned error is temporary / retryable or not (see golang/go#66252).
Because of that, errors returned by (*safesocket.winIOPipeListener).Accept() cause the LocalAPI
server (aka ipnserver.Server) to shut down, and tailscaled process to exit.
While this might be acceptable in the case of non-recoverable errors, such as programmer errors,
we shouldn't shut down the entire tailscaled process for client- or connection-specific errors,
such as when we couldn't obtain the client's access token because the client attempts to connect
at the Anonymous impersonation level. Instead, the LocalAPI server should gracefully handle
these errors by denying access and returning a 401 Unauthorized to the client.
In tailscale/tscert#15, we fixed a known bug where Caddy and other apps using tscert would attempt
to connect at the Anonymous impersonation level and fail. However, we should also fix this on the tailscaled
side to prevent a potential DoS, where a local app could deliberately open the Tailscale LocalAPI named pipe
at the Anonymous impersonation level and cause tailscaled to exit.
In this PR, we defer token retrieval until (*WindowsClientConn).Token() is called and propagate the returned token
or error via ipnauth.GetConnIdentity() to ipnserver, which handles it the same way as other ipnauth-related errors.
Fixes#18212Fixestailscale/tscert#13
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This gauge will be reworked to include endpoint state in future.
Updates tailscale/corp#30820
Change-Id: I66f349d89422b46eec4ecbaf1a99ad656c7301f9
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
In dynamically changing environments where ACME account keys and certs
are stored separately, it can happen that the account key would get
deleted (and recreated) between issuances. If that is the case,
we currently fail renewals and the only way to recover is for users
to delete certs.
This adds a config knob to allow opting out of the replaces extension
and utilizes it in the Kubernetes operator where there are known
user workflows that could end up with this edge case.
Updates #18251
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Adding both user and client metrics for peer relay forwarded bytes and
packets, and the total endpoints gauge.
User metrics:
tailscaled_peer_relay_forwarded_packets_total{transport_in, transport_out}
tailscaled_peer_relay_forwarded_bytes_total{transport_in, transport_out}
tailscaled_peer_relay_endpoints_total{}
Where the transport labels can be of "udp4" or "udp6".
Client metrics:
udprelay_forwarded_(packets|bytes)_udp(4|6)_udp(4|6)
udprelay_endpoints
RELNOTE: Expose tailscaled metrics for peer relay.
Updates tailscale/corp#30820
Change-Id: I1a905d15bdc5ee84e28017e0b93210e2d9660259
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
Adds support for targeting FQDNs that are a Tailscale Service. Uses the
same method of searching for Services as the tailscale configure
kubeconfig command. This fixes using the tailscale.com/tailnet-fqdn
annotation for Kubernetes Service when the specified FQDN is a Tailscale
Service.
Fixes#16534
Change-Id: I422795de76dc83ae30e7e757bc4fbd8eec21cc64
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Signed-off-by: Becky Pauley <becky@tailscale.com>
IsZero is required by the interface, so we should use that before trying
to serialize the key.
Updates #35412
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
When the TS_DEBUG_DNS_FORWARD_SEND envknob is turned on, also log the
source IP:port of the query that tailscaled is forwarding.
Updates tailscale/corp#35374
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
updates tailscale/corp#33891
Addresses several older the TODO's in netmon. This removes the
Major flag precomputes the ChangeDelta state, rather than making
consumers of ChangeDeltas sort that out themselves. We're also seeing
a lot of ChangeDelta's being flagged as "Major" when they are
not interesting, triggering rebinds in wgengine that are not needed. This
cleans that up and adds a host of additional tests.
The dependencies are cleaned, notably removing dependency on netmon
itself for calculating what is interesting, and what is not. This includes letting
individual platforms set a bespoke global "IsInterestingInterface"
function. This is only used on Darwin.
RebindRequired now roughly follows how "Major" was historically
calculated but includes some additional checks for various
uninteresting events such as changes in interface addresses that
shouldn't trigger a rebind. This significantly reduces thrashing (by
roughly half on Darwin clients which switching between nics). The individual
values that we roll into RebindRequired are also exposed so that
components consuming netmap.ChangeDelta can ask more
targeted questions.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
The existing client metric methods only support incrementing (or
decrementing) a delta value. This new method allows setting the metric
to a specific value.
Updates tailscale/corp#35327
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
This commit also introduces a sync.Mutex for guarding mutatable fields
on serverEndpoint, now that it is no longer guarded by the sync.Mutex
in Server.
These changes reduce lock contention and by effect increase aggregate
throughput under high flow count load. A benchmark on Linux with AWS
c8gn instances showed a ~30% increase in aggregate throughput (37Gb/s
vs 28Gb/s) for 12 tailscaled flows.
Updates tailscale/corp#35264
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Add flags:
* --cigocached-host to support alternative host resolution in other
environments, like the corp repo.
* --stats to reduce the amount of bash script we need.
* --version to support a caching tool/cigocacher script that will
download from GitHub releases.
Updates tailscale/corp#10808
Change-Id: Ib2447bc5f79058669a70f2c49cef6aedd7afc049
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
tcpHandlerForVIPService was missing ProxyProtocol support that
tcpHandlerForServe already had. Extract the shared logic into
forwardTCPWithProxyProtocol helper and use it in both handlers.
Fixes#18172
Signed-off-by: Raj Singh <raj@tailscale.com>
Add metrics about logtail uploading and underlying buffer.
Add metrics to the in-memory buffer implementation.
Updates tailscale/corp#21363
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
PR #18033 skipped tests for the versions of Linux 6.6 and 6.12 that
had a regression in /proc/net/tcp that causes seek operations to fail
with “illegal seek”.
This PR skips tests for Linux 6.14.0, which is the default Ubuntu
kernel, that also contains this regression.
Updates #16966
Signed-off-by: Simon Law <sfllaw@tailscale.com>
The filch implementation is fairly broken:
* When Filch.cur exceeds MaxFileSize, it calls moveContents
to copy the entirety of cur into alt (while holding the write lock).
By nature, this is the movement of a lot of data in a hot path,
meaning that all log calls will be globally blocked!
It also means that log uploads will be blocked during the move.
* The implementation of moveContents is buggy in that
it copies data from cur into the start of alt,
but fails to truncate alt to the number of bytes copied.
Consequently, there are unrelated lines near the end,
leading to out-of-order lines when being read back.
* Data filched via stderr do not directly respect MaxFileSize,
which is only checked every 100 Filch.Write calls.
This means that it is possible that the file grows far beyond
the specified max file size before moveContents is called.
* If both log files have data when New is called,
it also copies the entirety of cur into alt.
This can block the startup of a process copying lots of data
before the process can do any useful work.
* TryReadLine is implemented using bufio.Scanner.
Unfortunately, it will choke on any lines longer than
bufio.MaxScanTokenSize, rather than gracefully skip over them.
The re-implementation avoids a lot of these problems
by fundamentally eliminating the need for moveContent.
We enforce MaxFileSize by simply rotating the log files
whenever the current file exceeds MaxFileSize/2.
This is a constant-time operation regardless of file size.
To more gracefully handle lines longer than bufio.MaxScanTokenSize,
we skip over these lines (without growing the read buffer)
and report an error. This allows subsequent lines to be read.
In order to improve debugging, we add a lot of metrics.
Note that the the mechanism of dup2 with stderr
is inherently racy with a the two file approach.
The order of operations during a rotation is carefully chosen
to reduce the race window to be as short as possible.
Thus, this is slightly less racy than before.
Updates tailscale/corp#21363
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
When receiving a TSMPDiscoAdvertisement from peer, update the discokey
for said peer.
Some parts taken from: https://github.com/tailscale/tailscale/pull/18073/
Updates #12639
Co-authored-by: James Tucker <james@tailscale.com>
Re-instate the linking of iptables installed in Tailscale container
to the legacy iptables version. In environments where the legacy
iptables is not needed, we should be able to run nftables instead,
but this will ensure that Tailscale keeps working in environments
that don't support nftables, such as some Synology NAS hosts.
Updates #17854
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Add --certmode=gcp for using Google Cloud Certificate Manager's
public CA instead of Let's Encrypt. GCP requires External Account
Binding (EAB) credentials for ACME registration, so this adds
--acme-eab-kid and --acme-eab-key flags.
The EAB key accepts both base64url and standard base64 encoding
to support both ACME spec format and gcloud output.
Fixestailscale/corp#34881
Signed-off-by: Raj Singh <raj@tailscale.com>
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When using the resolve.conf file for setting DNS, it is possible that
some other services will trample the file and overwrite our set DNS
server. Experiments has shown this to be a racy error depending on how
quickly processes start.
Make an attempt to trample back the file a limited number of times if
the file is changed.
Updates #16635
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
When peers request an IP address mapping to be stored, the connector
stores it in memory.
Fixestailscale/corp#34251
Signed-off-by: Fran Bull <fran@tailscale.com>
To save rebuilding cigocacher on each CI job, build it on-demand, and
publish a release similar to how we publish releases for tool/go to
consume. Once the first release is done, we can add a new
tool/cigocacher script that pins to a specific release for each branch
to download.
Updates tailscale/corp#10808
Change-Id: I7694b2c2240020ba2335eb467522cdd029469b6c
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
It appears (*controlclient.Auto).Shutdown() can still deadlock when called with b.mu held, and therefore the changes in #18127 are unsafe.
This reverts #18127 until we figure out what causes it.
This reverts commit d199ecac80.
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This improves our test coverage of the Bootstrap() method, especially
around catching AUMs that shouldn't pass validation.
Updates #cleanup
Change-Id: Idc61fcbc6daaa98c36d20ec61e45ce48771b85de
Signed-off-by: Alex Chan <alexc@tailscale.com>
Previously, if users attempted to expose a headless Service to tailnet,
this just silently did not work.
This PR makes the operator throw a warning event + update Service's
status with an error message.
Updates #18139
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The event queue gets deleted events, which means that sometimes
the object that should be reconciled no longer exists.
Don't log user facing errors if that is the case.
Updates #18141
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The service was starting after systemd itself, and while this
surprisingly worked for some situations, it broke for others.
Change it to start after a GUI has been initialized.
Updates #17656
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Previously we only set this when it updated, which was fine for the first
call to Start(), but after that point future updates would be skipped if
nothing had changed. If Start() was called again, it would wipe the peer API
endpoints and they wouldn't get added back again, breaking exit nodes (and
anything else requiring peer API to be advertised).
Updates tailscale/corp#27173
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Based on PR #16700 by @lox, adapted to current codebase.
Adds support for proxying HTTP requests to Unix domain sockets via
tailscale serve unix:/path/to/socket, enabling exposure of services
like Docker, containerd, PHP-FPM over Tailscale without TCP bridging.
The implementation includes reasonable protections against exposure of
tailscaled's own socket.
Adaptations from original PR:
- Use net.Dialer.DialContext instead of net.Dial for context propagation
- Use http.Transport with Protocols API (current h2c approach, not http2.Transport)
- Resolve conflicts with hasScheme variable in ExpandProxyTargetValue
Updates #9771
Signed-off-by: Peter A. <ink.splatters@pm.me>
Co-authored-by: Lachlan Donald <lachlan@ljd.cc>
If a packet arrives while WireGuard is being reconfigured with b.mu held, such as during a profile switch,
calling back into (*LocalBackend).GetPeerAPIPort from (*Wrapper).filterPacketInboundFromWireGuard
may deadlock when it tries to acquire b.mu.
This occurs because a peer cannot be removed while an inbound packet is being processed.
The reconfig and profile switch wait for (*Peer).RoutineSequentialReceiver to return, but it never finishes
because GetPeerAPIPort needs b.mu, which the waiting goroutine already holds.
In this PR, we make peerAPIPorts a new syncs.AtomicValue field that is written with b.mu held
but can be read by GetPeerAPIPort without holding the mutex, which fixes the deadlock.
There might be other long-term ways to address the issue, such as moving peer API listeners
from LocalBackend to nodeBackend so they can be accessed without holding b.mu,
but these changes are too large and risky at this stage in the v1.92 release cycle.
Updates #18124
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Previously, callers of (*LocalBackend).resetControlClientLocked were supposed
to call Shutdown on the returned controlclient.Client after releasing b.mu.
In #17804, we started calling Shutdown while holding b.mu, which caused
deadlocks during profile switches due to the (*ExecQueue).RunSync implementation.
We first patched this in #18053 by calling Shutdown in a new goroutine,
which avoided the deadlocks but made TestStateMachine flaky because
the shutdown order was no longer guaranteed.
In #18070, we updated (*ExecQueue).RunSync to allow shutting down
the queue without waiting for RunSync to return. With that change,
shutting down the control client while holding b.mu became safe.
Therefore, this PR updates (*LocalBackend).resetControlClientLocked
to shut down the old client synchronously during the reset, instead of
returning it and shifting that responsibility to the callers.
This fixes the flaky tests and simplifies the code.
Fixes#18052
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This commit uses SO_REUSEPORT (when supported) to bind multiple sockets
per address family. Increasing the number of sockets can increase
aggregate throughput when serving many peer relay client flows.
Benchmarks show 3x improvement in max aggregate bitrate in some
environments.
Updates tailscale/corp#34745
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Add support for pinning specific Tailscale versions during installation
via the TAILSCALE_VERSION environment variable.
Example usage:
curl -fsSL https://tailscale.com/install.sh | TAILSCALE_VERSION=1.88.4 sh
Fixes#17776
Signed-off-by: Raj Singh <raj@tailscale.com>
111 is 3 years old, and there have been a lot of speed improvements
since then. We run wasm-opt twice as part of the CI wasm job, and it
currently takes about 3 minutes each time. With 125, it takes ~40
seconds, a 4.5x speed-up.
Updates #cleanup
Change-Id: I671ae6cefa3997a23cdcab6871896b6b03e83a4f
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Implements a new disk put function for cigocacher that does not cause
locking issues on Windows when there are multiple processes reading and
writing the same files concurrently. Integrates cigocacher into test.yml
for Windows where we are running on larger runners that support
connecting to private Azure vnet resources where cigocached is hosted.
Updates tailscale/corp#10808
Change-Id: I0d0e9b670e49e0f9abf01ff3d605cd660dd85ebb
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
The cache artifacts from a full run of test.yml are 14GB. Only save
artifacts from the main branch to ensure we don't thrash too much. Most
branches should get decent performance with a hit from recent main.
Fixestailscale/corp#34739
Change-Id: Ia83269d878e4781e3ddf33f1db2f21d06ea2130f
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Thanks to seamless key renewal, you can now do a force-reauth without
losing your connection in all circumstances. We softened the interactive
warning (see #17262) so let's soften the help text as well.
Updates https://github.com/tailscale/corp/issues/32429
Signed-off-by: Alex Chan <alexc@tailscale.com>
* cmd/k8s-operator: add support for taiscale.com/http-redirect
The k8s-operator now supports a tailscale.com/http-redirect annotation
on Ingress resources. When enabled, this automatically creates port 80
handlers that automatically redirect to the equivalent HTTPS location.
Fixes#11252
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
* Fix for permanent redirect
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
* lint
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
* warn for redirect+endpoint
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
* tests
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
---------
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Restrict running the golangci-lint workflow to when the workflow file
itself or a .go file, go.mod, or go.sum have actually been modified.
Updates #cleanup
Signed-off-by: Mario Minardi <mario@tailscale.com>
Skip the "request review" workflows for PRs that are in draft to reduce
noise / skip adding reviewers to PRs that are intentionally marked as
not ready to review.
Updates #cleanup
Signed-off-by: Mario Minardi <mario@tailscale.com>
Adds an observation point that may identify potentially abusive traffic
patterns at outlier values.
Updates tailscale/corp#24681
Signed-off-by: James Tucker <james@tailscale.com>
We don't hold q.mu while running normal ExecQueue.Add funcs, so we
shouldn't in RunSync either. Otherwise code it calls can't shut down
the queue, as seen in #18502.
Updates #18052
Co-authored-by: Nick Khyl <nickk@tailscale.com>
Change-Id: Ic5e53440411eca5e9fabac7f4a68a9f6ef026de1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This patch adds an integration test for Tailnet Lock, checking that a node can't
talk to peers in the tailnet until it becomes signed.
This patch also introduces a new package `tstest/tkatest`, which has some helpers
for constructing a mock control server that responds to TKA requests. This allows
us to reduce boilerplate in the IPN tests.
Updates tailscale/corp#33599
Signed-off-by: Alex Chan <alexc@tailscale.com>
In preparation for exposing its configuration via ipn.ConfigVAlpha,
change {Masked}Prefs.RelayServerPort from *int to *uint16. This takes a
defensive stance against invalid inputs at JSON decode time.
'tailscale set --relay-server-port' is currently the only input to this
pref, and has always sanitized input to fit within a uint16.
Updates tailscale/corp#34591
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Adds a new types of TSMP messages for advertising disco keys keys
to/from a peer, and implements the advertising triggered by a TSMP ping.
Needed as part of the effort to cache the netmap and still let clients
connect without control being reachable.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Co-authored-by: James Tucker <james@tailscale.com>
In suggestExitNodeLocked, if no exit node candidates have a home DERP or
valid location info, `bestCandidates` is an empty slice. This slice is
passed to `selectNode` (`randomNode` in prod):
```go func randomNode(nodes views.Slice[tailcfg.NodeView], …) tailcfg.NodeView {
…
return nodes.At(rand.IntN(nodes.Len()))
}
```
An empty slice becomes a call to `rand.IntN(0)`, which panics.
This patch changes the behaviour, so if we've filtered out all the
candidates before calling `selectNode`, reset the list and then pick
from any of the available candidates.
This patch also updates our tests to give us more coverage of `randomNode`,
so we can spot other potential issues.
Updates #17661
Change-Id: I63eb5e4494d45a1df5b1f4b1b5c6d5576322aa72
Signed-off-by: Alex Chan <alexc@tailscale.com>
And fix up the TestAutoUpdateDefaults integration tests as they
weren't testing reality: the DefaultAutoUpdate is supposed to only be
relevant on the first MapResponse in the stream, but the tests weren't
testing that. They were instead injecting a 2nd+ MapResponse.
This changes the test control server to add a hook to modify the first
map response, and then makes the test control when the node goes up
and down to make new map responses.
Also, the test now runs on macOS where the auto-update feature being
disabled would've previously t.Skipped the whole test.
Updates #11502
Change-Id: If2319bd1f71e108b57d79fe500b2acedbc76e1a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In PR tailscale/corp#34401, the `traffic-steering` feature flag does
not automatically enable traffic steering for all nodes. Instead, an
admin must add the `traffic-steering` node attribute to each client
node that they want opted-in.
For backwards compatibility with older clients, tailscale/corp#34401
strips out the `traffic-steering` node attribute if the feature flag
is not enabled, even if it is set in the policy file. This lets us
safely disable the feature flag.
This PR adds a missing test case for suggested exit nodes that have no
priority.
Updates tailscale/corp#34399
Signed-off-by: Simon Law <sfllaw@tailscale.com>
This commit fixes a bug in our HA ingress reconciler where ingress resources would
be stuck in a deleting state should their associated VIP service be deleted within
control.
The reconciliation loop would check for the existence of the VIP service and if not
found perform no additional cleanup steps. The code has been modified to continue
onwards even if the VIP service is not found.
Fixes: https://github.com/tailscale/tailscale/issues/18049
Signed-off-by: David Bond <davidsbond93@gmail.com>
This commit replaces crypto/rand challenge generation with a blake2s-256
MAC. This enables the peer relay server to respond to multiple forward
disco.BindUDPRelayEndpoint messages per handshake generation without
sacrificing the proof of IP ownership properties of the handshake.
Responding to multiple forward disco.BindUDPRelayEndpoint messages per
handshake generation improves client address/path selection where
lowest client->server path/addr one-way delay does not necessarily
equate to lowest client<->server round trip delay.
It also improves situations where outbound traffic is filtered
independent of input, and the first reply
disco.BindUDPRelayEndpointChallenge message is dropped on the reply
path, but a later reply using a different source would make it through.
Reduction in serverEndpoint state saves 112 bytes per instance, trading
for slightly more expensive crypto ops: 277ns/op vs 321ns/op on an M1
Macbook Pro.
Updates tailscale/corp#34414
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Adds cmd/cigocacher as the client to cigocached for Go caching over
HTTP. The HTTP cache is best-effort only, and builds will fall back to
disk-only cache if it's not available, much like regular builds.
Not yet used in CI; that will follow in another PR once we have runners
available in this repo with the right network setup for reaching
cigocached.
Updates tailscale/corp#10808
Change-Id: I13ae1a12450eb2a05bd9843f358474243989e967
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
When the underlying transport returns a network error, the RoundTrip
method returns (nil, error). The defer was attempting to access resp
without checking if it was nil first, causing a panic. Fix this by
checking for nil in the defer.
Also changes driveTransport.tr from *http.Transport to http.RoundTripper
and adds a test.
Fixes#17306
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
Change-Id: Icf38a020b45aaa9cfbc1415d55fd8b70b978f54c
SetSubnetRoutes was not sending update notifications to nodes when their
approved routes changed, causing nodes to not fetch updated netmaps with
PrimaryRoutes populated. This resulted in TestUserMetricsRouteGauges
flaking because it waited for PrimaryRoutes to be set, which only happened
if the node happened to poll for other reasons.
Now send updateSelfChanged notification to affected nodes so they fetch
an updated netmap immediately.
Fixes#17962
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
Linux kernel versions 6.6.102-104 and 6.12.42-45 have a regression
in /proc/net/tcp that causes seek operations to fail with "illegal seek".
This breaks portlist tests on these kernels.
Add kernel version detection for Linux systems and a SkipOnKernelVersions
helper to tstest. Use it to skip affected portlist tests on the broken
kernel versions.
Thanks to philiptaron for the list of kernels with the issue and fix.
Updates #16966
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
Bounded DeliveredEvent queues reduce memory usage, but they can deadlock under load.
Two common scenarios trigger deadlocks when the number of events published in a short
period exceeds twice the queue capacity (there's a PublishedEvent queue of the same size):
- a subscriber tries to acquire the same mutex as held by a publisher, or
- a subscriber for A events publishes B events
Avoiding these scenarios is not practical and would limit eventbus usefulness and reduce its adoption,
pushing us back to callbacks and other legacy mechanisms. These deadlocks already occurred in customer
devices, dev machines, and tests. They also make it harder to identify and fix slow subscribers and similar
issues we have been seeing recently.
Choosing an arbitrary large fixed queue capacity would only mask the problem. A client running
on a sufficiently large and complex customer environment can exceed any meaningful constant limit,
since event volume depends on the number of peers and other factors. Behavior also changes
based on scheduling of publishers and subscribers by the Go runtime, OS, and hardware, as the issue
is essentially a race between publishers and subscribers. Additionally, on lower-end devices,
an unreasonably high constant capacity is practically the same as using unbounded queues.
Therefore, this PR changes the event queue implementation to be unbounded by default.
The PublishedEvent queue keeps its existing capacity of 16 items, while subscribers'
DeliveredEvent queues become unbounded.
This change fixes known deadlocks and makes the system stable under load,
at the cost of higher potential memory usage, including cases where a queue grows
during an event burst and does not shrink when load decreases.
Further improvements can be implemented in the future as needed.
Fixes#17973Fixes#18012
Signed-off-by: Nick Khyl <nickk@tailscale.com>
As of 2025-11-20, publishing more events than the eventbus's
internal queues can hold may deadlock if a subscriber tries
to publish events itself.
This commit adds a test that demonstrates this deadlock,
and skips it until the bug is fixed.
Updates #18012
Signed-off-by: Nick Khyl <nickk@tailscale.com>
As of 2025-11-20, publishing more events than the eventbus's
internal queues can hold may deadlock if a subscriber tries
to acquire a mutex that can also be held by a publisher.
This commit adds a test that demonstrates this deadlock,
and skips it until the bug is fixed.
Updates #17973
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This is causing confusing panics in tailscale/corp#34485. We'll keep
using the tka.ChonkMem constructor as much as we can, but don't panic
if you create a tka.Mem directly -- we know what the sensible thing is.
Updates #cleanup
Signed-off-by: Alex Chan <alexc@tailscale.com>
Change-Id: I49309f5f403fc26ce4f9a6cf0edc8eddf6a6f3a4
With the introduction of node sealing, store.New fails in some cases due
to the TPM device being reset or unavailable. Currently it results in
tailscaled crashing at startup, which is not obvious to the user until
they check the logs.
Instead of crashing tailscaled at startup, start with an in-memory store
with a health warning about state initialization and a link to (future)
docs on what to do. When this health message is set, also block any
login attempts to avoid masking the problem with an ephemeral node
registration.
Updates #15830
Updates #17654
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
These validations were previously performed in the CLI frontend. There
are two motivations for moving these to the local backend:
1. The backend controls synchronization around the relevant state, so
only the backend can guarantee many of these validations.
2. Doing these validations in the back-end avoids the need to repeat
them across every frontend (e.g. the CLI and tsnet).
Updates tailscale/corp#27200
Signed-off-by: Harry Harpham <harry@tailscale.com>
This commit adds the `spec.replicas` field to the `Recorder` custom
resource that allows for a highly available deployment of `tsrecorder`
within a kubernetes cluster.
Many changes were required here as the code hard-coded the assumption
of a single replica. This has required a few loops, similar to what we
do for the `Connector` resource to create auth and state secrets. It
was also required to add a check to remove dangling state and auth
secrets should the recorder be scaled down.
Updates: https://github.com/tailscale/tailscale/issues/17965
Signed-off-by: David Bond <davidsbond93@gmail.com>
fixestailscale/tailscale#17990
The logging for the netns caps is spammy. Log only on changes
to the values and don't log Darwin specific stuff on non Darwin
clients.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
This commit modifies the kubernetes operator to use the "stable" version
of `k8s-nameserver` by default.
Updates: https://github.com/tailscale/corp/issues/19028
Signed-off-by: David Bond <davidsbond93@gmail.com>
This commit enables user to set service backend to remote destinations, that can be a partial
URL or a full URL. The commit also prevents user to set remote destinations on linux system
when socket mark is not working. For user on any version of mac extension they can't serve a
service either. The socket mark usability is determined by a new local api.
Fixestailscale/corp#24783
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
Now that we support using an in-memory backend for TKA state (#17946),
this function always returns `nil` – we can always support Network Lock.
We don't need it any more.
Plus, clean up a couple of errant TODOs from that PR.
Updates tailscale/corp#33599
Change-Id: Ief93bb9adebb82b9ad1b3e406d1ae9d2fa234877
Signed-off-by: Alex Chan <alexc@tailscale.com>
Our style guide recommends avoiding Latin abbreviations in technical
documentation, which includes the CLI help text. This is causing linter
issues for the docs site, because this help text is copied into the docs.
See http://go/style-guide/kb/language-and-grammar/abbreviations#latin-abbreviations
Updates #cleanup
Change-Id: I980c28d996466f0503aaaa65127685f4af608039
Signed-off-by: Alex Chan <alexc@tailscale.com>
ArgoCD sends boolean values but the template expects strings, causing
"incompatible types for comparison" errors. Wrap values with toString
so both work.
Fixes#17158
Signed-off-by: Raj Singh <raj@tailscale.com>
Previously a TKA compaction would only run when a node starts, which means a long-running node could use unbounded storage as it accumulates ever-increasing amounts of TKA state. This patch changes TKA so it runs a compaction after every sync.
Updates https://github.com/tailscale/corp/issues/33537
Change-Id: I91df887ea0c5a5b00cb6caced85aeffa2a4b24ee
Signed-off-by: Alex Chan <alexc@tailscale.com>
This commit modifies the helm/static manifest configuration for the
k8s-operator to prefer the stable image tag. This avoids making those
using static manifests seeing unstable behaviour by default if they
do not manually make the change.
This is managed for us when using helm but not when generating the
static manifests.
Updates https://github.com/tailscale/tailscale/issues/10655
Signed-off-by: David Bond <davidsbond93@gmail.com>
(trying to get in smaller obvious chunks ahead of later PRs to make
them smaller)
Updates #17925
Change-Id: I184002001055790484e4792af8ffe2a9a2465b2e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We now embed node information into network flow logs.
By default, netlogfmt still prints out using Tailscale IP addresses.
Support a "--resolve-addrs=TYPE" flag that can be used to specify
resolving IP addresses as node IDs, hostnames, users, or tags.
Updates tailscale/corp#33352
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Adds the ability to rotate discovery keys on running clients, needed for
testing upcoming disco key distribution changes.
Introduces key.DiscoKey, an atomic container for a disco private key,
public key, and the public key's ShortString, replacing the prior
separate atomic fields.
magicsock.Conn has a new RotateDiscoKey method, and access to this is
provided via localapi and a CLI debug command.
Note that this implementation is primarily for testing as it stands, and
regular use should likely introduce an additional mechanism that allows
the old key to be used for some time, to provide a seamless key rotation
rather than one that invalidates all sessions.
Updates tailscale/corp#34037
Signed-off-by: James Tucker <james@tailscale.com>
As part of the conn25 work we will want to be able to keep track of a
pool of IP Addresses and know which have been used and which have not.
Fixestailscale/corp#34247
Signed-off-by: Fran Bull <fran@tailscale.com>
We use `tka.AUMHash` in `netmap.NetworkMap`, and we serialise it as JSON
in the `/debug/netmap` C2N endpoint. If the binary omits Tailnet Lock support,
the debug endpoint returns an error because it's unable to marshal the
AUMHash.
This patch adds a sentinel value so this marshalling works, and we can
use the debug endpoint.
Updates https://github.com/tailscale/tailscale/issues/17115
Signed-off-by: Alex Chan <alexc@tailscale.com>
Change-Id: I51ec1491a74e9b9f49d1766abd89681049e09ce4
Existing compaction logic seems to have had an assumption that
markActiveChain would cover a longer part of the chain than
markYoungAUMs. This prevented long, but fresh, chains, from being
compacted correctly.
Updates tailscale/corp#33537
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
6a73c0bdf5 added a feature tag but didn't re-run go generate on ./feature/buildfeatures.
Updates #9192
Change-Id: I7819450453e6b34c60cad29d2273e3e118291643
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I added a RemoveAll() method on tka.Chonk in #17946, but it's only used
in the node to purge local AUMs. We don't need it in the SQLite storage,
which currently implements tka.Chonk, so move it to CompactableChonk
instead.
Also add some automated tests, as a safety net.
Updates tailscale/corp#33599
Change-Id: I54de9ccf1d6a3d29b36a94eccb0ebd235acd4ebc
Signed-off-by: Alex Chan <alexc@tailscale.com>
The REST API does not return a node name
with a trailing dot, while the internal node name
reported in the netmap does have one.
In order to be consistent with the API,
strip the dot when recording node information.
Updates tailscale/corp#33352
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Perform a path check first before attempting exec of `true`.
Try /usr/bin/true first, as that is now and increasingly so, the more
common and more portable path.
Fixes tests on macOS arm64 where exec was returning a different kind of
path error than previously checked.
Updates #16569
Signed-off-by: James Tucker <james@tailscale.com>
DA protection is not super helpful because we don't set an authorization
password on the key. But if authorization fails for other reasons (like
TPM being reset), we will eventually cause DA lockout with tailscaled
trying to load the key. DA lockout then leads to (1) issues for other
processes using the TPM and (2) the underlying authorization error being
masked in logs.
Updates #17654
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
For manual (human) testing, this lets the user disable control plane
map polls with "tailscale set --sync=false" (which survives restarts)
and "tailscale set --sync" to restore.
A high severity health warning is shown while this is active.
Updates #12639
Updates #17945
Change-Id: I83668fa5de3b5e5e25444df0815ec2a859153a6d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Let's fix all the typos, which lets the code be more readable, lest we
confuse our readers.
Updates #cleanup
Change-Id: I4954601b0592b1fda40269009647bb517a4457be
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This requires making the internals of LocalBackend a bit more generic,
and implementing the `tka.CompactableChonk` interface for `tka.Mem`.
Signed-off-by: Alex Chan <alexc@tailscale.com>
Updates https://github.com/tailscale/corp/issues/33599
Pick up a fix for https://pkg.go.dev/vuln/GO-2025-4116 (even though
we're not affected).
Updates #cleanup
Change-Id: I9f2571b17c1f14db58ece8a5a34785805217d9dd
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Includes adding StartPaused, which will be used in a future change to
enable netmap caching testing.
Updates #12639
Change-Id: Iec39915d33b8d75e9b8315b281b1af2f5d13a44a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This patch changes the behaviour of `tailscale lock log --json` to make
it more useful for users. It also introduces versioning of our JSON output.
## Changes to `tailscale lock log --json`
Previously this command would print the hash and base64-encoded bytes of
each AUM, and users would need their own CBOR decoder to interpret it in
a useful way:
```json
[
{
"Hash": [
80,
136,
151,
…
],
"Change": "checkpoint",
"Raw": "pAEFAvYFpQH2AopYIAkPN+8V3cJpkoC5ZY2+RI2Bcg2q5G7tRAQQd67W3YpnWCDPOo4KGeQBd8hdGsjoEQpSXyiPdlm+NXAlJ5dS1qEbFlggylNJDQM5ZQ2ULNsXxg2ZBFkPl/D93I1M56/rowU+UIlYIPZ/SxT9EA2Idy9kaCbsFzjX/s3Ms7584wWGbWd/f/QAWCBHYZzYiAPpQ+NXN+1Wn2fopQYk4yl7kNQcMXUKNAdt1lggcfjcuVACOH0J9pRNvYZQFOkbiBmLOW1hPKJsbC1D1GdYIKrJ38XMgpVMuTuBxM4YwoLmrK/RgXQw1uVEL3cywl3QWCA0FilVVv8uys8BNhS62cfNvCew1Pw5wIgSe3Prv8d8pFggQrwIt6ldYtyFPQcC5V18qrCnt7VpThACaz5RYzpx7RNYIKskOA7UoNiVtMkOrV2QoXv6EvDpbO26a01lVeh8UCeEA4KjAQECAQNYIORIdNHqSOzz1trIygnP5w3JWK2DtlY5NDIBbD7SKcjWowEBAgEDWCD27LpxiZNiA19k0QZhOWmJRvBdK2mz+dHu7rf0iGTPFwQb69Gt42fKNn0FGwRUiav/k6dDF4GiAVgg5Eh00epI7PPW2sjKCc/nDclYrYO2Vjk0MgFsPtIpyNYCWEDzIAooc+m45ay5PB/OB4AA9Fdki4KJq9Ll+PF6IJHYlOVhpTbc3E0KF7ODu1WURd0f7PXnW72dr89CSfGxIHAF"
}
]
```
Now we print the AUM in an expanded form that can be easily read by scripts,
although we include the raw bytes for verification and auditing.
```json
{
"SchemaVersion": "1",
"Messages": [
{
"Hash": "KCEJPRKNSXJG2TPH3EHQRLJNLIIK2DV53FUNPADWA7BZJWBDRXZQ",
"AUM": {
"MessageKind": "checkpoint",
"PrevAUMHash": null,
"Key": null,
"KeyID": null,
"State": {
…
},
"Votes": null,
"Meta": null,
"Signatures": [
{
"KeyID": "tlpub:e44874d1ea48ecf3d6dac8ca09cfe70dc958ad83b656393432016c3ed229c8d6",
"Signature": "8yAKKHPpuOWsuTwfzgeAAPRXZIuCiavS5fjxeiCR2JTlYaU23NxNChezg7tVlEXdH+z151u9na/PQknxsSBwBQ=="
}
]
},
"Raw": "pAEFAvYFpQH2AopYIAkPN-8V3cJpkoC5ZY2-RI2Bcg2q5G7tRAQQd67W3YpnWCDPOo4KGeQBd8hdGsjoEQpSXyiPdlm-NXAlJ5dS1qEbFlggylNJDQM5ZQ2ULNsXxg2ZBFkPl_D93I1M56_rowU-UIlYIPZ_SxT9EA2Idy9kaCbsFzjX_s3Ms7584wWGbWd_f_QAWCBHYZzYiAPpQ-NXN-1Wn2fopQYk4yl7kNQcMXUKNAdt1lggcfjcuVACOH0J9pRNvYZQFOkbiBmLOW1hPKJsbC1D1GdYIKrJ38XMgpVMuTuBxM4YwoLmrK_RgXQw1uVEL3cywl3QWCA0FilVVv8uys8BNhS62cfNvCew1Pw5wIgSe3Prv8d8pFggQrwIt6ldYtyFPQcC5V18qrCnt7VpThACaz5RYzpx7RNYIKskOA7UoNiVtMkOrV2QoXv6EvDpbO26a01lVeh8UCeEA4KjAQECAQNYIORIdNHqSOzz1trIygnP5w3JWK2DtlY5NDIBbD7SKcjWowEBAgEDWCD27LpxiZNiA19k0QZhOWmJRvBdK2mz-dHu7rf0iGTPFwQb69Gt42fKNn0FGwRUiav_k6dDF4GiAVgg5Eh00epI7PPW2sjKCc_nDclYrYO2Vjk0MgFsPtIpyNYCWEDzIAooc-m45ay5PB_OB4AA9Fdki4KJq9Ll-PF6IJHYlOVhpTbc3E0KF7ODu1WURd0f7PXnW72dr89CSfGxIHAF"
}
]
}
```
This output was previously marked as unstable, and it wasn't very useful,
so changing it should be fine.
## Versioning our JSON output
This patch introduces a way to version our JSON output on the CLI, so we
can make backwards-incompatible changes in future without breaking existing
scripts or integrations.
You can run this command in two ways:
```
tailscale lock log --json
tailscale lock log --json=1
```
Passing an explicit version number allows you to pick a specific JSON schema.
If we ever want to change the schema, we increment the version number and
users must opt-in to the new output.
A bare `--json` flag will always return schema version 1, for compatibility
with existing scripts.
Updates https://github.com/tailscale/tailscale/issues/17613
Updates https://github.com/tailscale/corp/issues/23258
Signed-off-by: Alex Chan <alexc@tailscale.com>
Change-Id: I897f78521cc1a81651f5476228c0882d7b723606
This adds the --proxy-protocol flag to 'tailscale serve' and
'tailscale funnel', which tells the Tailscale client to prepend a PROXY
protocol[1] header when making connections to the proxied-to backend.
I've verified that this works with our existing funnel servers without
additional work, since they pass along source address information via
PeerAPI already.
Updates #7747
[1]: https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
Change-Id: I647c24d319375c1b33e995555a541b7615d2d203
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
It's an unnecessary nuisance having it. We go out of our way to redact
it in so many places when we don't even need it there anyway.
Updates #12639
Change-Id: I5fc72e19e9cf36caeb42cf80ba430873f67167c3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Remove the State enum (StateNew, StateNotAuthenticated, etc.) from
controlclient and replace it with two explicit boolean fields:
- LoginFinished: indicates successful authentication
- Synced: indicates we've received at least one netmap
This makes the state more composable and easier to reason about, as
multiple conditions can be true independently rather than being
encoded in a single enum value.
The State enum was originally intended as the state machine for the
whole client, but that abstraction moved to ipn.Backend long ago.
This change continues moving away from the legacy state machine by
representing state as a combination of independent facts.
Also adds test helpers in ipnlocal that check independent, observable
facts (hasValidNetMap, needsLogin, etc.) rather than relying on
derived state enums, making tests more robust.
Updates #12639
Signed-off-by: James Tucker <james@tailscale.com>
The key.NewEmptyHardwareAttestationKey hook returns a non-nil empty
attestationKey, which means that the nil check in Clone doesn't trigger
and proceeds to try and clone an empty key. Check IsZero instead to
reduce log spam from Clone.
As a drive-by, make tpmAvailable check a sync.Once because the result
won't change.
Updates #17882
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Most /etc/os-release files set the VERSION_ID to a `MAJOR.MINOR`
string, but we were trying to compare this numerically against a major
version number. I can only assume that Linux Mint used switched from a
plain integer, since shells only do integer comparisons.
This patch extracts a VERSION_MAJOR from the VERSION_ID using
parameter expansion and unifies all the other ad-hoc comparisons to
use it.
Fixes#15841
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Co-authored-by: Xavier <xhienne@users.noreply.github.com>
LinkChangeLogLimiter keeps a subscription to track rate limits for log
messages. But when its context ended, it would exit the subscription loop,
leaving the subscriber still alive. Ensure the subscriber gets cleaned up
when the context ends, so we don't stall event processing.
Updates tailscale/corp#34311
Change-Id: I82749e482e9a00dfc47f04afbc69dd0237537cb2
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
On the corp tailnet (using Mullvad exit nodes + bunch of expired
devices + subnet routers), these were generating big ~35 KB blobs of
logging regularly.
This logging shouldn't even exist at this level, and should be rate
limited at a higher level, but for now as a bandaid, make it less
spammy.
Updates #cleanup
Change-Id: I0b5e9e6e859f13df5f982cd71cd5af85b73f0c0a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When TS_LOG_TARGET is set to an invalid URL, url.Parse returns an error
and nil pointer, which caused a panic when accessing u.Host.
Now we check the error from url.Parse and log a helpful message while
falling back to the default log host.
Fixes#17792
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
As a baby step towards eventbus-ifying controlclient, make the
Observer optional.
This also means callers that don't care (like this network lock test,
and some tests in other repos) can omit it, rather than passing in a
no-op one.
Updates #12639
Change-Id: Ibd776b45b4425c08db19405bc3172b238e87da4e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit replaces usage of local.Client in net/udprelay with DERPMap
plumbing over the eventbus. This has been a longstanding TODO. This work
was also accelerated by a memory leak in net/http when using
local.Client over long periods of time. So, this commit also addresses
said leak.
Updates #17801
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Instead of trying to call View() on something that's already a View
type (or trying to Clone the view unnecessarily), we can re-use the
existing View values in a map[T]ViewType.
Fixes#17866
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
They distracted me in some refactoring. They're set but never used.
Updates #17858
Change-Id: I6ec7d6841ab684a55bccca7b7cbf7da9c782694f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
updates tailscale/corp#31571
It appears that on the latest macOS, iOS and tVOS versions, the work
that netns is doing to bind outgoing connections to the default interface (and all
of the trimmings and workarounds in netmon et al that make that work) are
not needed. The kernel is extension-aware and doing nothing, is the right
thing. This is, however, not the case for tailscaled (which is not a
special process).
To allow us to test this assertion (and where it might break things), we add a
new node cap that turns this behaviour off only for network-extension equipped clients,
making it possible to turn this off tailnet-wide, without breaking any tailscaled
macos nodes.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
I noticed a deadlock in a test in a in-development PR where during a
shutdown storm of things (from a tsnet.Server.Close), LocalBackend was
trying to call magicsock.Conn.Synchronize but the magicsock and/or
eventbus was already shut down and no longer processing events.
Updates #16369
Change-Id: I58b1f86c8959303c3fb46e2e3b7f38f6385036f1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Unfortunately I closed the tab and lost it in my sea of CI failures
I'm currently fighting.
Updates #cleanup
Change-Id: I4e3a652d57d52b75238f25d104fc1987add64191
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
So they're not all run N times on the sharded oss builders
and are only run one time each.
Updates tailscale/corp#28679
Change-Id: Ie21e84b06731fdc8ec3212eceb136c8fc26b0115
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When systemd notification support was omitted from the build, or on
non-Linux systems, we were unnecessarily emitting code and generating
garbage stringifying addresses upon transition to the Running state.
Updates #12614
Change-Id: If713f47351c7922bb70e9da85bf92725b25954b9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This removes one of the O(n=peers) allocs in getStatus, as
Engine.getStatus happens more often than Reconfig.
Updates #17814
Change-Id: I8a87fbebbecca3aedadba38e46cc418fd163c2b0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously if `chains` was empty, it would be passed to `computeActiveAncestor()`,
which would fail with the misleading error "multiple distinct chains".
Updates tailscale/corp#33846
Signed-off-by: Alex Chan <alexc@tailscale.com>
Change-Id: Ib93a755dbdf4127f81cbf69f3eece5a388db31c8
* lock released early just to call `b.send` when it can call
`b.sendToLocked` instead
* `UnlockEarly` called to release the lock before trivially fast
operations, we can wait for a defer there
Updates #11649
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
It was disabled in May 2024 in #12205 (9eb72bb51).
This removes the unused symbols.
Updates #188
Updates tailscale/corp#19106
Updates tailscale/corp#19116
Change-Id: I5208b7b750b18226ed703532ed58c4ea17195a8e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Use GetGlobalAddrs() to discover all STUN endpoints, handling bad NATs
that create multiple mappings. When MappingVariesByDestIP is true, also
add the first STUN IPv4 address with the relay's local port for static
port mapping scenarios.
Updates #17796
Signed-off-by: Raj Singh <raj@tailscale.com>
The feature is currently in private alpha, so requires a tailnet feature
flag. Initially focuses on supporting the operator's own auth, because the
operator is the only device we maintain that uses static long-lived
credentials. All other operator-created devices use single-use auth keys.
Testing steps:
* Create a cluster with an API server accessible over public internet
* kubectl get --raw /.well-known/openid-configuration | jq '.issuer'
* Create a federated OAuth client in the Tailscale admin console with:
* The issuer from the previous step
* Subject claim `system:serviceaccount:tailscale:operator`
* Write scopes services, devices:core, auth_keys
* Tag tag:k8s-operator
* Allow the Tailscale control plane to get the public portion of
the ServiceAccount token signing key without authentication:
* kubectl create clusterrolebinding oidc-discovery \
--clusterrole=system:service-account-issuer-discovery \
--group=system:unauthenticated
* helm install --set oauth.clientId=... --set oauth.audience=...
Updates #17457
Change-Id: Ib29c85ba97b093c70b002f4f41793ffc02e6c6e9
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Now that the feature is in beta, no one should encounter this error.
Updates #cleanup
Change-Id: I69ed3f460b7f28c44da43ce2f552042f980a0420
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This starts running the jsontags vet checker on the module.
All existing findings are adding to an allowlist.
Updates tailscale/corp#791
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The cmd/jsontags is non-idiomatic since it is not a main binary.
Move it to a vet directory, which will eventually contain a vettool binary.
Update tailscale/corp#791
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Include the node's OS with network flow log information.
Refactor the JSON-length computation to be a bit more precise.
Updates tailscale/corp#33352Fixestailscale/corp#34030
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Prior to this change a SubscriberFunc treated the call to the subscriber's
function as the completion of delivery. But that means when we are closing the
subscriber, that callback could continue to execute for some time after the
close returns.
For channel-based subscribers that works OK because the close takes effect
before the subscriber ever sees the event. To make the two subscriber types
symmetric, we should also wait for the callback to finish before returning.
This ensures that a Close of the client means the same thing with both kinds of
subscriber.
Updates #17638
Change-Id: I82fd31bcaa4e92fab07981ac0e57e6e3a7d9d60b
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Add options to the eventbus.Bus to plumb in a logger.
Route that logger in to the subscriber machinery, and trigger a log message to
it when a subscriber fails to respond to its delivered events for 5s or more.
The log message includes the package, filename, and line number of the call
site that created the subscription.
Add tests that verify this works.
Updates #17680
Change-Id: I0546516476b1e13e6a9cf79f19db2fe55e56c698
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
In particular on Windows, the `transport.TPMCloser` we get is not safe
for concurrent use. This is especially noticeable because
`tpm.attestationKey.Clone` uses the same open handle as the original
key. So wrap the operations on ak.tpm with a mutex and make a deep copy
with a new connection in Clone.
Updates #15830
Updates #17662
Updates #17644
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Specify the app apability that failed the test, instead of the
entire comma-separated list.
Fixes #cleanup
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
In #17639 we moved the subscription into NewLogger to ensure we would not race
subscribing with shutdown of the eventbus client. Doing so fixed that problem,
but exposed another: As we were only servicing events occasionally when waiting
for the network to come up, we could leave the eventbus to stall in cases where
a number of network deltas arrived later and weren't processed.
To address that, let's separate the concerns: As before, we'll Subscribe early
to avoid conflicts with shutdown; but instead of using the subscriber directly
to determine readiness, we'll keep track of the last-known network state in a
selectable condition that the subscriber updates for us. When we want to wait,
we'll wait on that condition (or until our context ends), ensuring all the
events get processed in a timely manner.
Updates #17638
Updates #15160
Change-Id: I28339a372be4ab24be46e2834a218874c33a0d2d
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Adds a new Redirect field to HTTPHandler for serving HTTP redirects
from the Tailscale serve config. The redirect URL supports template
variables ${HOST} and ${REQUEST_URI} that are resolved per request.
By default, it redirects using HTTP Status 302 (Found). For another
redirect status, like 301 - Moved Permanently, pass the HTTP status
code followed by ':' on Redirect, like: "301:https://tailscale.com"
Updates #11252
Updates #11330
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
In 3f5c560fd4 I changed to use std net/http's HTTP/2 support,
instead of pulling in x/net/http2.
But I forgot to update DialTLSContext to DialContext, which meant it
was falling back to using the std net.Dialer for its dials, instead
of the passed-in one.
The tests only passed because they were using localhost addresses, so
the std net.Dialer worked. But in prod, where a tsnet Dialer would be
needed, it didn't work, and would time out for 10 seconds before
resorting to the old protocol.
So this fixes the tests to use an isolated in-memory network to prevent
that class of problem in the future. With the test change, the old code
fails and the new code passes.
Thanks to @jasonodonnell for debugging!
Updates #17304
Updates 3f5c560fd4
Change-Id: I3602bafd07dc6548e2c62985af9ac0afb3a0e967
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Single letter 'l' variables can eventually become confusing when
they're rendered in some fonts that make them similar to 1 or I.
Updates #cleanup
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
A follow-up to #17411. Put AppConnector events into a task queue, as they may
take some time to process. Ensure that the queue is stopped at shutdown so that
cleanup will remain orderly.
Because events are delivered on a separate goroutine, slow processing of an
event does not cause an immediate problem; however, a subscriber that blocks
for a long time will push back on the bus as a whole. See
https://godoc.org/tailscale.com/util/eventbus#hdr-Expected_subscriber_behavior
for more discussion.
Updates #17192
Updates #15160
Change-Id: Ib313cc68aec273daf2b1ad79538266c81ef063e3
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This migrates an internal tool to open source
so that we can run it on the tailscale.com module as well.
This PR does not yet set up a CI to run this analyzer.
Updates tailscale/corp#791
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This rewrites the netlog package to support embedding node information in network flow logs.
Some bit of complexity comes in trying to pre-compute the expected size of the log message
after JSON serialization to ensure that we can respect maximum body limits in log uploading.
We also fix a bug in tstun, where we were recording the IP address after SNAT,
which was resulting in non-sensible connection flows being logged.
Updates tailscale/corp#33352
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This migrates an internal tool to open source
so that we can run it on the tailscale.com module as well.
We add the "util/safediff" also as a dependency of the tool.
This PR does not yet set up a CI to run this analyzer.
Updates tailscale/corp#791
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Found by staticcheck, the test was calling derphttp.NewClient but not checking
its error result before doing other things to it.
Updates #cleanup
Change-Id: I4ade35a7de7c473571f176e747866bc0ab5774db
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This reverts commit 4346615d77.
We averted the shutdown race, but will need to service the subscriber even when
we are not waiting for a change so that we do not delay the bus as a whole.
Updates #17638
Change-Id: I5488466ed83f5ad1141c95267f5ae54878a24657
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Drop usage of the branches filter with a single asterisk as this matches
against zero or more characters but not a forward slash, resulting in
PRs to branch names with forwards slashes in them not having these
workflow run against them as expected.
Updates https://github.com/tailscale/corp/issues/33523
Signed-off-by: Mario Minardi <mario@tailscale.com>
Also consolidates variable and header naming and amends the
CLI behavior
* multiple app-caps have to be specified as comma-separated
list
* simple regex-based validation of app capability names is
carried out during flag parsing
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
Given that we filter based on the usercaps argument now, truncation
should not be necessary anymore.
Updates tailscale/corp/#28372
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
Temporarily back out the TPM-based hw attestation code while we debug
Windows exceptions.
Updates tailscale/corp#31269
Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
When the eventbus is enabled, set up the subscription for change deltas at the
beginning when the client is created, rather than waiting for the first
awaitInternetUp check.
Otherwise, it is possible for a check to race with the client close in
Shutdown, which triggers a panic.
Updates #17638
Change-Id: I461c07939eca46699072b14b1814ecf28eec750c
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This compares the warnings we actually care about and skips the unstable
warnings and the changes with no warnings.
Fixes#17635
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
If you run tailscaled without passing a `--statedir`, Tailnet Lock is
unavailable -- we don't have a folder to store the AUMs in.
This causes a lot of unnecessary requests to bootstrap TKA, because
every time the node receives a NetMap with some TKA state, it tries to
bootstrap, fetches the bootstrap TKA state from the control plane, then
fails with the error:
TKA sync error: bootstrap: network-lock is not supported in this
configuration, try setting --statedir
We can't prevent the error, but we can skip the control plane request
that immediately gets dropped on the floor.
In local testing, a new node joining a tailnet caused *three* control
plane requests which were unused.
Updates tailscale/corp#19441
Signed-off-by: Alex Chan <alexc@tailscale.com>
This fixes a regression from dd615c8fdd that moved the
newIPTablesRunner constructor from a any-Linux-GOARCH file to one that
was only amd64 and arm64, thus breaking iptables on other platforms
(notably 32-bit "arm", as seen on older Pis running Buster with
iptables)
Tested by hand on a Raspberry Pi 2 w/ Buster + iptables for now, for
lack of automated 32-bit arm tests at the moment. But filed #17629.
Fixes#17623
Updates #17629
Change-Id: Iac1a3d78f35d8428821b46f0fed3f3717891c1bd
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
On some platforms e.g. ChromeOS the owner hierarchy might not always be
available to us. To avoid stale sealing exceptions later we probe to
confirm it's working rather than rely solely on family indicator status.
Updates #17622
Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
Check that the TPM we have opened is advertised as a 2.0 family device
before using it for state sealing / hardware attestation.
Updates #17622
Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
This reformats the existing text to have line breaks at sentences. This
commit contains no textual changes to the code of conduct, but is done
to make any subsequent changes easier to review. (sembr.org)
Also apply prettier formatting for consistency.
Updates #cleanup
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
* When we do the TKA sync, log whether TKA is enabled and whether
we want it to be enabled. This would help us see if a node is
making bootstrap errors.
* When we fail to look up an AUM locally, log the ID of the AUM
rather than a generic "file does not exist" error.
These AUM IDs are cryptographic hashes of the TKA state, which
itself just contains public keys and signatures. These IDs aren't
sensitive and logging them is safe.
Signed-off-by: Alex Chan <alexc@tailscale.com>
Updates https://github.com/tailscale/corp/issues/33594
Service hosts must be tagged nodes, meaning it is only valid to
advertise a Service from a machine which has at least one ACL tag.
Fixestailscale/corp#33197
Signed-off-by: Harry Harpham <harry@tailscale.com>
If users start the application with sudo, DBUS is likely not available
or will not have the correct endpoints. We want to warn users when doing
this.
Closes#17593
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This does not change which subscriptions are made, it only swaps them to use
the SubscribeFunc API instead of Subscribe.
Updates #15160
Updates #17487
Change-Id: Id56027836c96942206200567a118f8bcf9c07f64
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This patch creates a set of tests that should be true for all implementations of Chonk and CompactableChonk, which we can share with the SQLite implementation in corp.
It includes all the existing tests, plus a test for LastActiveAncestor which was in corp but not in oss.
Updates https://github.com/tailscale/corp/issues/33465
Signed-off-by: Alex Chan <alexc@tailscale.com>
Previously, running `tailscale lock log` in a tailnet without Tailnet
Lock enabled would return a potentially confusing error:
$ tailscale lock log
2025/10/20 11:07:09 failed to connect to local Tailscale service; is Tailscale running?
It would return this error even if Tailscale was running.
This patch fixes the error to be:
$ tailscale lock log
Tailnet Lock is not enabled
Fixes#17586
Signed-off-by: Alex Chan <alexc@tailscale.com>
Add new arguments to `tailscale up` so authkeys can be generated dynamically via identity federation.
Updates #9192
Signed-off-by: mcoulombe <max@tailscale.com>
* Remove a couple of single-letter `l` variables
* Use named struct parameters in the test cases for readability
* Delete `wantAfterInactivityForFn` parameter when it returns the
default zero
Updates #cleanup
Signed-off-by: Alex Chan <alexc@tailscale.com>
We soft-delete AUMs when they're purged, but when we call `ChildAUMs()`,
we look up soft-deleted AUMs to find the `Children` field.
This patch changes the behaviour of `ChildAUMs()` so it only looks at
not-deleted AUMs. This means we don't need to record child information
on AUMs any more, which is a minor space saving for any newly-recorded
AUMs.
Updates https://github.com/tailscale/tailscale/issues/17566
Updates https://github.com/tailscale/corp/issues/27166
Signed-off-by: Alex Chan <alexc@tailscale.com>
This method was added in cca25f6 in the initial in-memory implementation
of Chonk, but it's not part of the Chonk interface and isn't implemented
or used anywhere else. Let's get rid of it.
Updates https://github.com/tailscale/corp/issues/33465
Signed-off-by: Alex Chan <alexc@tailscale.com>
This commit modifies the k8s-operator's api proxy implementation to only
enable forwarding of api requests to tsrecorder when an environment
variable is set.
This new environment variable is named `TS_EXPERIMENTAL_KUBE_API_EVENTS`.
Updates https://github.com/tailscale/corp/issues/32448
Signed-off-by: David Bond <davidsbond93@gmail.com>
Merge the connstats package into the netlog package
and unexport all of its declarations.
Remove the buildfeatures.HasConnStats and use HasNetLog instead.
Updates tailscale/corp#33352
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The connstats package was an unnecessary layer of indirection.
It was seperated out of wgengine/netlog so that net/tstun and
wgengine/magicsock wouldn't need a depenedency on the concrete
implementation of network flow logging.
Instead, we simply register a callback for counting connections.
This PR does the bare minimum work to prepare tstun and magicsock
to only care about that callback.
A future PR will delete connstats and merge it into netlog.
Updates tailscale/corp#33352
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Remove CBOR representation since it was never used.
We should support CBOR in the future, but for remove it
for now so that it is less work to add more fields.
Also, rely on just omitzero for JSON now that it is supported in Go 1.24.
Updates tailscale/corp#33352
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
I was debugging a customer issue and saw in their 1.88.3 logs:
TPM: error opening: stat /dev/tpm0: no such file or directory
That's unnecessary output. The lack of TPM will be reported by
them having a nil Hostinfo.TPM, which is plenty elsewhere in logs.
Let's only write out an "error opening" line if it's an interesting
error. (perhaps permissions, or EIO, etc)
Updates #cleanup
Change-Id: I3f987f6bf1d3ada03473ca3eef555e9cfafc7677
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Before synctest, timers was needed to allow the events to flow into the
test bus. There is still a timer, but this one is not derived from the
test deadline and it is mostly arbitrary as synctest will render it
practically non-existent.
With this approach, tests that do not need to test for the absence of
events do not rely on synctest.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
On Windows arm64 we are going to need to ship two different GUI builds;
one for Win10 (GOARCH=386) and one for Win11 (GOARCH=amd64, tags +=
winui). Due to quirks in MSI packaging, they cannot both share the
same filename. This requires some fixes in places where we have
hardcoded "tailscale-ipn" as the GUI filename.
We also do some cleanup in clientupdate to ensure that autoupdates
will continue to work correctly with the temporary "-winui" package
variant.
Fixes#17480
Updates https://github.com/tailscale/corp/issues/29940
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Extend Persist with AttestationKey to record a hardware-backed
attestation key for the node's identity.
Add a flag to tailscaled to allow users to control the use of
hardware-backed keys to bind node identity to individual machines.
Updates tailscale/corp#31269
Change-Id: Idcf40d730a448d85f07f1bebf387f086d4c58be3
Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
The default representation of time.Duration has different
JSON representation between v1 and v2.
Apply an explicit format flag that uses the v1 representation
so that this behavior does not change if serialized with v2.
Updates tailscale/corp#791
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
updates tailscale/tailscale#16836
Android's altNetInterfaces implementation now returns net.IPAddr
types which netmon wasn't handling.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
With a channel subscriber, the subscription processing always occurs on another
goroutine. The SubscriberFunc (prior to this commit) runs its callbacks on the
client's own goroutine. This changes the semantics, though: In addition to more
directly pushing back on the publisher, a publisher and subscriber can deadlock
in a SubscriberFunc but succeed on a Subscriber. They should behave
equivalently regardless which interface they use.
Arguably the caller should deal with this by creating its own goroutine if it
needs to. However, that loses much of the benefit of the SubscriberFunc API, as
it will need to manage the lifecycle of that goroutine. So, for practical
ergonomics, let's make the SubscriberFunc do this management on the user's
behalf. (We discussed doing this in #17432, but decided not to do it yet). We
can optimize this approach further, if we need to, without changing the API.
Updates #17487
Change-Id: I19ea9e8f246f7b406711f5a16518ef7ff21a1ac9
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This commit adds the subcommands `get-config` and `set-config` to Serve,
which can be used to read the current Tailscale Services configuration
in a standard syntax and provide a configuration to declaratively apply
with that same syntax.
Both commands must be provided with either `--service=svc:service` for
one service, or `--all` for all services. When writing a config,
`--set-config --all` will overwrite all existing Services configuration,
and `--set-config --service=svc:service` will overwrite all
configuration for that particular Service. Incremental changes are not
supported.
Fixestailscale/corp#30983.
cmd/tailscale/cli: hide serve "get-config"/"set-config" commands for now
tailscale/corp#33152 tracks unhiding them when docs exist.
Signed-off-by: Naman Sood <mail@nsood.in>
when tsrecorder receives events, it populates this field with
information about the node the request was sent to.
Updates #17141
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
The lazy init led to confusion and a belief that was something was
wrong. It's reasonable to expect the daemon to listen on the port at the
time it's configured.
Updates tailscale/corp#33094
Signed-off-by: Jordan Whited <jordan@tailscale.com>
I got sidetracked apparently and never finished writing this Clone
code in 316afe7d02 (#17448). (It really should use views instead.)
And then I missed one of the users of "routerChanged" that was broken up
into "routerChanged" vs "dnsChanged".
This broke integration tests elsewhere.
Fixes#17506
Change-Id: I533bf0fcf3da9ac6eb4a6cdef03b8df2c1fb4c8e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Update Nix flake to use go 1.25.2
Create the hash from the toolchain rev file automatically from
update-flake.sh
Updates tailscale/go#135
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
This patch fixes several issues related to printing login and device
approval URLs, especially when `tailscale up` is interrupted:
1. Only print a login URL that will cause `tailscale up` to complete.
Don't print expired URLs or URLs from previous login attempts.
2. Print the device approval URL if you run `tailscale up` after
previously completing a login, but before approving the device.
3. Use the correct control URL for device approval if you run a bare
`tailscale up` after previously completing a login, but before
approving the device.
4. Don't print the device approval URL more than once (or at least,
not consecutively).
Updates tailscale/corp#31476
Updates #17361
## How these fixes work
This patch went through a lot of trial and error, and there may still
be bugs! These notes capture the different scenarios and considerations
as we wrote it, which are also captured by integration tests.
1. We were getting stale login URLs from the initial IPN state
notification.
When the IPN watcher was moved to before Start() in c011369, we
mistakenly continued to request the initial state. This is only
necessary if you start watching after you call Start(), because
you may have missed some notifications.
By getting the initial state before calling Start(), we'd get
a stale login URL. If you clicked that URL, you could complete
the login in the control server (if it wasn't expired), but your
instance of `tailscale up` would hang, because it's listening for
login updates from a different login URL.
In this patch, we no longer request the initial state, and so we
don't print a stale URL.
2. Once you skip the initial state from IPN, the following sequence:
* Run `tailscale up`
* Log into a tailnet with device approval
* ^C after the device approval URL is printed, but without approving
* Run `tailscale up` again
means that nothing would ever be printed.
`tailscale up` would send tailscaled the pref `WantRunning: true`,
but that was already the case so nothing changes. You never get any
IPN notifications, and in particular you never get a state change to
`NeedsMachineAuth`. This means we'd never print the device approval URL.
In this patch, we add a hard-coded rule that if you're doing a simple up
(which won't trigger any other IPN notifications) and you start in the
`NeedsMachineAuth` state, we print the device approval message without
waiting for an IPN notification.
3. Consider the following sequence:
* Run `tailscale up --login-server=<custom server>`
* Log into a tailnet with device approval
* ^C after the device approval URL is printed, but without approving
* Run `tailscale up` again
We'd print the device approval URL for the default control server,
rather than the real control server, because we were using the `prefs`
from the CLI arguments (which are all the defaults) rather than the
`curPrefs` (which contain the custom login server).
In this patch, we use the `prefs` if the user has specified any settings
(and other code will ensure this is a complete set of settings) or
`curPrefs` if it's a simple `tailscale up`.
4. Consider the following sequence: you've logged in, but not completed
device approval, and you run `down` and `up` in quick succession.
* `up`: sees state=NeedsMachineAuth
* `up`: sends `{wantRunning: true}`, prints out the device approval URL
* `down`: changes state to Stopped
* `up`: changes state to Starting
* tailscaled: changes state to NeedsMachineAuth
* `up`: gets an IPN notification with the state change, and prints
a second device approval URL
Either URL works, but this is annoying for the user.
In this patch, we track whether the last printed URL was the device
approval URL, and if so, we skip printing it a second time.
Signed-off-by: Alex Chan <alexc@tailscale.com>
This patch extends the integration tests for `tailscale up` to include tailnets
where new devices need to be approved. It doesn't change the CLI, because it's
mostly working correctly already -- these tests are just to prevent future
regressions.
I've added support for `MachineAuthorized` to mock control, and I've refactored
`TestOneNodeUpAuth` to be more flexible. It now takes a sequence of steps to
run and asserts whether we got a login URL and/or machine approval URL after
each step.
Updates tailscale/corp#31476
Updates #17361
Signed-off-by: Alex Chan <alexc@tailscale.com>
This commit also shuffles the hasPeerRelayServers atomic load
to happen sooner, reducing the cost for clients with no peer relay
servers.
Updates tailscale/corp#33099
Signed-off-by: Jordan Whited <jordan@tailscale.com>
The hijacker on k8s-proxy's reverse proxy is used to stream recordings
to tsrecorder as they pass through the proxy to the kubernetes api
server. The connection to the recorder was using the client's
(e.g., kubectl) context, rather than a dedicated one. This was causing
the recording stream to get cut off in scenarios where the client
cancelled the context before streaming could be completed.
By using a dedicated context, we can continue streaming even if the
client cancels the context (for example if the client request
completes).
Fixes#17404
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Originally proposed by @bradfitz in #17413.
In practice, a lot of subscribers have only one event type of interest, or a
small number of mostly independent ones. In that case, the overhead of running
and maintaining a goroutine to select on multiple channels winds up being more
noisy than we'd like for the user of the API.
For this common case, add a new SubscriberFunc[T] type that delivers events to
a callback owned by the subscriber, directly on the goroutine belonging to the
client itself. This frees the consumer from the need to maintain their own
goroutine to pull events from the channel, and to watch for closure of the
subscriber.
Before:
s := eventbus.Subscribe[T](eventClient)
go func() {
for {
select {
case <-s.Done():
return
case e := <-s.Events():
doSomethingWith(e)
}
}
}()
// ...
s.Close()
After:
func doSomethingWithT(e T) { ... }
s := eventbus.SubscribeFunc(eventClient, doSomethingWithT)
// ...
s.Close()
Moreover, unless the caller wants to explicitly stop the subscriber separately
from its governing client, it need not capture the SubscriberFunc value at all.
One downside of this approach is that a slow or deadlocked callback could block
client's service routine and thus stall all other subscriptions on that client,
However, this can already happen more broadly if a subscriber fails to service
its delivery channel in a timely manner, it just feeds back more immediately.
Updates #17487
Change-Id: I64592d786005177aa9fd445c263178ed415784d5
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Since #17376, containerboot crashes on startup in k8s because state
encryption is enabled by default without first checking that it's
compatible with the selected state store. Make sure we only default
state encryption to enabled if it's not going to immediately clash with
other bits of tailscaled config.
Updates tailscale/corp#32909
Change-Id: I76c586772750d6da188cc97b647c6e0c1a8734f0
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Saves ~94 KB from the min build.
Updates #12614
Change-Id: I3b0b8a47f80b9fd3b1038c2834b60afa55bf02c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Part of making all netlink monitoring code optional.
Updates #17311 (how I got started down this path)
Updates #12614
Change-Id: Ic80d8a7a44dc261c4b8678b3c2241c3b3778370d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Also pull out interface method only needed in Linux.
Instead of having userspace do the call into the router, just let the
router pick up the change itself.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Before we introduced seamless, the "blocked" state was used to track:
* Whether a login was required for connectivity, and therefore we should
keep the engine deconfigured until that happened
* Whether authentication was in progress
"blocked" would stop authReconfig from running. We want this when a login is
required: if your key has expired we want to deconfigure the engine and keep
it down, so that you don't keep using exit nodes (which won't work because
your key has expired).
Taking the engine down while auth was in progress was undesirable, so we
don't do that with seamless renewal. However, not entering the "blocked"
state meant that we needed to change the logic for when to send
LoginFinished on the IPN bus after seeing StateAuthenticated from the
controlclient. Initially we changed the "if blocked" check to "if blocked or
seamless is enabled" which was correct in other places.
In this place however, it introduced a bug: we are sending LoginFinished
every time we see StateAuthenticated, which happens even on a down & up, or
a profile switch. This in turn made it harder for UI clients to track when
authentication is complete.
Instead we should only send it out if we were blocked (i.e. seamless is
disabled, or our key expired) or an auth was in progress.
Updates tailscale/corp#31476
Updates tailscale/corp#32645
Fixes#17363
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Saves 45 KB from the min build, no longer pulling in deephash or
util/hashx, both with unsafe code.
It can actually be more efficient to not use deephash, as you don't
have to walk all bytes of all fields recursively to answer that two
things are not equal. Instead, you can just return false at the first
difference you see. And then with views (as we use ~everywhere
nowadays), the cloning the old value isn't expensive, as it's just a
pointer under the hood.
Updates #12614
Change-Id: I7b08616b8a09b3ade454bb5e0ac5672086fe8aec
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Historically, and until recently, --extra-small produced a usable build.
When I recently made osrouter be modular in 39e35379d4 (which is
useful in, say, tsnet builds) after also making netstack modular, that
meant --min now lacked both netstack support for routing and system
support for routing, making no way to get packets into
wireguard. That's not a nice default to users. (we've documented
build_dist.sh in our KB)
Restore --extra-small to making a usable build, and add --min for
benchmarking purposes.
Updates #12614
Change-Id: I649e41e324a36a0ca94953229c9914046b5dc497
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Some of the test cases access fields of the backend that are supposed to be
locked while the test is running, which can trigger the race detector. I fixed
a few of these in #17411, but I missed these two cases.
Updates #15160
Updates #17192
Change-Id: I45664d5e34320ecdccd2844e0f8b228145aaf603
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Saves ~53 KB from the min build.
Updates #12614
Change-Id: I73f9544a9feea06027c6ebdd222d712ada851299
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add subscribers for AppConnector events
Make the RouteAdvertiser interface optional We cannot yet remove it because
the tests still depend on it to verify correctness. We will need to separately
update the test fixtures to remove that dependency.
Publish RouteInfo via the event bus, so we do not need a callback to do that.
Replace it with a flag that indicates whether to treat the route info the connector
has as "definitive" for filtering purposes.
Update the tests to simplify the construction of AppConnector values now that a
store callback is no longer required. Also fix a couple of pre-existing racy tests that
were hidden by not being concurrent in the same way production is.
Updates #15160
Updates #17192
Change-Id: Id39525c0f02184e88feaf0d8a3c05504850e47ee
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
If we received a wg engine status while processing an auth URL, there was a
race condition where the authURL could be reset to "" immediately after we
set it.
To fix this we need to check that we are moving from a non-Running state to
a Running state rather than always resetting the URL when we "move" into a
Running state even if that is the current state.
We also need to make sure that we do not return from stopEngineAndWait until
the engine is stopped: before, we would return as soon as we received any
engine status update, but that might have been an update already in-flight
before we asked the engine to stop. Now we wait until we see an update that
is indicative of a stopped engine, or we see that the engine is unblocked
again, which indicates that the engine stopped and then started again while
we were waiting before we checked the state.
Updates #17388
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Co-authored-by: Nick Khyl <nickk@tailscale.com>
Saves ~102 KB from the min build.
Updates #12614
Change-Id: Ie1d4f439321267b9f98046593cb289ee3c4d6249
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Due to iOS memory limitations in 2020 (see
https://tailscale.com/blog/go-linker, etc) and wireguard-go using
multiple goroutines per peer, commit 16a9cfe2f4 introduced some
convoluted pathsways through Tailscale to look at packets before
they're delivered to wireguard-go and lazily reconfigure wireguard on
the fly before delivering a packet, only telling wireguard about peers
that are active.
We eventually want to remove that code and integrate wireguard-go's
configuration with Tailscale's existing netmap tracking.
To make it easier to find that code later, this makes it modular. It
saves 12 KB (of disk) to turn it off (at the expense of lots of RAM),
but that's not really the point. The point is rather making it obvious
(via the new constants) where this code even is.
Updates #12614
Change-Id: I113b040f3e35f7d861c457eaa710d35f47cee1cb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Explain that this file stays forked from coder/websocket until we can
depend on an upstream release for the helper.
Updates #cleanup
Signed-off-by: kscooo <kscowork@gmail.com>
Switching to a Geneve-encapsulated (peer relay) path in
endpoint.handlePongConnLocked is expected around port rebinds, which end
up clearing endpoint.bestAddr.
Fixestailscale/corp#33036
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Saves only 12 KB, but notably removes some deps on packages that future
changes can then eliminate entirely.
Updates #12614
Change-Id: Ibf830d3ee08f621d0a2011b1d4cd175427ef50df
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
c2n was already a conditional feature, but it didn't have a
feature/c2n directory before (rather, it was using consts + DCE). This
adds it, and moves some code, which removes the httprec dependency.
Also, remove some unnecessary code from our httprec fork.
Updates #12614
Change-Id: I2fbe538e09794c517038e35a694a363312c426a2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
As found by @cmol in #17423.
Updates #17423
Change-Id: I1492501f74ca7b57a8c5278ea6cb87a56a4086b9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Saves 86 KB.
And stop depending on expvar and usermetrics when disabled,
in prep to removing all the expvar/metrics/tsweb stuff.
Updates #12614
Change-Id: I35d2479ddd1d39b615bab32b1fa940ae8cbf9b11
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This patch removes some code that didn’t get removed before merging
the changes in #16580.
Updates #cleanup
Updates #16551
Signed-off-by: Simon Law <sfllaw@tailscale.com>
kubestore init function has now been moved to a more explicit path of
ipn/store/kubestore meaning we can now avoid the generic import of
feature/condregister.
Updates #12614
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
When running integration tests on macOS, we get a panic from a nil
pointer dereference when calling `ci.creds.PID()`.
This panic occurs because the `ci.creds != nil` check is insufficient
after a recent refactoring (c45f881) that changed `ci.creds` from a
pointer to the `PeerCreds` interface. Now `ci.creds` always compares as
non-nil, so we enter this block even when the underlying value is nil.
The integration tests fail on macOS when `peercred.Get()` returns the
error `unix.GetsockoptInt: socket is not connected`. This error isn't
new, and the previous code was ignoring it correctly.
Since we trust that `peercred` returns either a usable value or an error,
checking for a nil error is a sufficient and correct gate to prevent the
method call and avoid the panic.
Fixes#17421
Signed-off-by: Alex Chan <alexc@tailscale.com>
In the earlier http2 package migration (1d93bdce20, #17394) I had
removed Direct.Close's tracking of the connPool, thinking it wasn't
necessary.
Some tests (in another repo) are strict and like it to tear down the
world and wait, to check for leaked goroutines. And they caught this
letting some goroutines idle past Close, even if they'd eventually
close down on their own.
This restores the connPool accounting and the aggressife close.
Updates #17305
Updates #17394
Change-Id: I5fed283a179ff7c3e2be104836bbe58b05130cc7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The control plane will sometimes determine that a node is not online,
while the node is still able to connect to its peers. This patch
doesn’t solve this problem, but it does mitigate it.
This PR introduces the `client-side-reachability` node attribute that
switches the node to completely ignore the online signal from control.
In the future, the client itself should collect reachability data from
active Wireguard flows and Tailscale pings.
Updates #17366
Updates tailscale/corp#30379
Updates tailscale/corp#32686
Signed-off-by: Simon Law <sfllaw@tailscale.com>
A recent change (009d702adf) introduced a deadlock where the
/machine/update-health network request to report the client's health
status update to the control plane was moved to being synchronous
within the eventbus's pump machinery.
I started to instead make the health reporting be async, but then we
realized in the three years since we added that, it's barely been used
and doesn't pay for itself, for how many HTTP requests it makes.
Instead, delete it all and replace it with a c2n handler, which
provides much more helpful information.
Fixestailscale/corp#32952
Change-Id: I9e8a5458269ebfdda1c752d7bbb8af2780d71b04
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Saves 262 KB so far. I'm sure I missed some places, but shotizam says
these were the low hanging fruit.
Updates #12614
Change-Id: Ia31c01b454f627e6d0470229aae4e19d615e45e3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Maybe it matters? At least globally across all nodes?
Fixes#17343
Change-Id: I3f61758ea37de527e16602ec1a6e453d913b3195
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add and wire up event publishers for these two event types in the AppConnector.
Nothing currently subscribes to them, so this is harmless. Subscribers for
these events will be added in a near-future commit.
As part of this, move the appc.RouteInfo type to the types/appctype package.
It does not contain any package-specific details from appc. Beside it, add
appctype.RouteUpdate to carry route update event state, likewise not specific
to appc. Update all usage of the appc.* types throughout to use appctype.*
instead, and update depaware files to reflect these changes.
Add a Close method to the AppConnector to make sure the client gets cleaned up
when the connector is dropped (we re-create connectors).
Update the unit tests in the appc package to also check the events published
alongside calls to the RouteAdvertiser.
For now the tests still rely on the RouteAdvertiser for correctness; this is OK
for now as the two methods are always performed together. In the near future,
we need to rework the tests so not require that, but that will require building
some more test fixtures that we can handle separately.
Updates #15160
Updates #17192
Change-Id: I184670ba2fb920e0d2cb2be7c6816259bca77afe
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Instead of using separate channels to manage the lifecycle of the eventbus
client, use the recently-added eventbus.Monitor, which handles signaling the
processing loop to stop and waiting for it to complete. This allows us to
simplify some of the setup and cleanup code in the relay server.
Updates #15160
Change-Id: Ia1a47ce2e5a31bc8f546dca4c56c3141a40d67af
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Saves 352 KB, removing one of our two HTTP/2 implementations linked
into the binary.
Fixes#17305
Updates #15015
Change-Id: I53a04b1f2687dca73c8541949465038b69aa6ade
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a .gitignore for the chart version of the CRDs that we never commit,
because the static manifest CRD files are the canonical version. This
makes it easier to deploy the CRDs via the helm chart in a way that
reflects the production workflow without making the git checkout
"dirty".
Given that the chart CRDs are ignored, we can also now safely generate
them for the kube-generate-all Makefile target without being a nuisance
to the state of the git checkout. Added a slightly more robust repo root
detection to the generation logic to make sure the command works from
the context of both the Makefile and the image builder command we run
for releases in corp.
Updates tailscale/corp#32085
Change-Id: Id44a4707c183bfaf95a160911ec7a42ffb1a1287
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
mkctr already has support for including extra files in the built
container image. Wire up a new optional environment variable to thread
that through to mkctr. The operator e2e tests will use this to bake
additional trusted CAs into the test image without significantly
departing from the normal build or deployment process for our
containers.
Updates tailscale/corp#32085
Change-Id: Ica94ed270da13782c4f5524fdc949f9218f79477
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Using memnet and synctest removes flakiness caused by real networking
and subtle timing differences.
Additionally, remove the `t.Logf` call inside the server's shutdown
goroutine that was causing a false positive data race detection.
The race detector is flagging a double write during this `t.Logf` call.
This is a common pattern, noted in golang/go#40343 and elsehwere in
this file, where using `t.Logf` after a test has finished can interact
poorly with the test runner.
This is a long-standing issue which became more common after rewriting
this test to use memnet and synctest.
Fixed#17355
Signed-off-by: Alex Chan <alexc@tailscale.com>
Whenever running on a platform that has a TPM (and tailscaled can access
it), default to encrypting the state. The user can still explicitly set
this flag to disable encryption.
Updates https://github.com/tailscale/corp/issues/32909
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
A following change will split out the controlclient.NoiseClient type
out, away from the rest of the controlclient package which is
relatively dependency heavy.
A question was where to move it, and whether to make a new (a fifth!)
package in the ts2021 dependency chain.
@creachadair and I brainstormed and decided to merge
internal/noiseconn and controlclient.NoiseClient into one package,
with names ts2021.Conn and ts2021.Client.
For ease of reviewing the subsequent PR, this is the first step that
just renames the internal/noiseconn package to control/ts2021.
Updates #17305
Change-Id: Ib5ea162dc1d336c1d805bdd9548d1702dd6e1468
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
depaware was merging golang.org/x/foo and std's
vendor/golang.org/x/foo packages (which could both be in the binary!),
leading to confusing output, especially when I was working on
eliminating duplicate packages imported under different names.
This makes the depaware output longer and grosser, but doesn't hide
reality from us.
Updates #17305
Change-Id: I21cc3418014e127f6c1a81caf4e84213ce84ab57
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Require the presence of the bus, but do not use it yet. Check for required
fields and update tests and production use to plumb the necessary arguments.
Updates #15160
Updates #17192
Change-Id: I8cefd2fdb314ca9945317d3320bd5ea6a92e8dcb
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
The callback itself is not removed as it is used in other repos, making
it simpler for those to slowly transition to the eventbus.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Replace the positional arguments to NewAppConnector with a Config struct.
Update the existing uses. Other than the API change, there are no functional
changes in this commit.
Updates #15160
Updates #17192
Change-Id: Ibf37f021372155a4db8aaf738f4b4f2c746bf623
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
It never launched and I've lost hope of it launching and it's in my
way now, so I guess it's time to say goodbye.
Updates tailscale/corp#4383
Updates #17305
Change-Id: I2eb551d49f2fb062979cc307f284df4b3dfa5956
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This permits other programs (in other repos) to conditionally
import ipn/store/awsstore and/or ipn/store/kubestore and have them
register themselves, rather than feature/condregister doing it.
Updates tailscale/corp#32922
Change-Id: I2936229ce37fd2acf9be5bf5254d4a262d090ec1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The `put` callback runs on a different goroutine to the test, so calling
t.Fatalf in put had no effect. `drain` is always called when checking what
was put and is called from the test goroutine, so that's a good place to
fail the test if the channel was too full.
Updates #17363
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
https://github.com/tailscale/tailscale/pull/17346 moved the kube and aws
arn store initializations to feature/condregister, under the assumption
that anything using it would use kubestore.New. Unfortunately,
cmd/k8s-proxy makes use of store.New, which compares the `<prefix>:`
supplied in the provided `path string` argument against known stores. If
it doesn't find it, it fallsback to using a FileStore.
Since cmd/k8s-proxy uses store.New to try and initialize a kube store in
some cases (without importing feature/condregister), it silently creates
a FileStore and that leads to misleading errors further along in
execution.
This fixes this issue by importing condregister, and successfully
initializes a kube store.
Updates #12614
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Saves 442 KB. Lock it with a new min test.
Updates #12614
Change-Id: Ia7bf6f797b6cbf08ea65419ade2f359d390f8e91
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We will need this for unmarshaling node prefs: use the zero
HardwareAttestationKey implementation when parsing and later check
`IsZero` to see if anything was loaded.
Updates #15830
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
I'm trying to remove the "regexp" and "regexp/syntax" packages from
our minimal builds. But tsweb pulls in regexp (via net/http/pprof etc)
and util/eventbus was importing the tsweb for no reason.
Updates #12614
Change-Id: Ifa8c371ece348f1dbf80d6b251381f3ed39d5fbd
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Even with ts_omit_drive, the drive package is currently still imported
for some types. So it should be light. But it was depending on the
"regexp" packge, which I'd like to remove from our minimal builds.
Updates #12614
Change-Id: I5bf85d8eb15a739793723b1da11c370d3fcd2f32
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The Tailscale CLI is the primary configuration interface and as such it
is used in scripts, container setups, and many other places that do not
have a terminal available and should not be made to respond to prompts.
The default is set to false where the "risky" API is being used by the
CLI and true otherwise, this means that the `--yes` flags are only
required under interactive runs and scripts do not need to be concerned
with prompts or extra flags.
Updates #19445
Signed-off-by: James Tucker <james@tailscale.com>
Saves 139 KB.
Also Synology support, which I saw had its own large-ish proxy parsing
support on Linux, but support for proxies without Synology proxy
support is reasonable, so I pulled that out as its own thing.
Updates #12614
Change-Id: I22de285a3def7be77fdcf23e2bec7c83c9655593
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In Dec 2021 in d3d503d997 I had grand plans to make exit node DNS
cheaper by using HTTP/2 over PeerAPI, at least on some platforms. I
only did server-side support though and never made it to the client.
In the ~4 years since, some things have happened:
* Go 1.24 got support for http.Protocols (https://pkg.go.dev/net/http#Protocols)
and doing UnencryptedHTTP2 ("HTTP2 with prior knowledge")
* The old h2c upgrade mechanism was deprecated; see https://github.com/golang/go/issues/63565
and https://github.com/golang/go/issues/67816
* Go plans to deprecate x/net/http2 and move everything to the standard library.
So this drops our use of the x/net/http2/h2c package and instead
enables h2c (on all platforms now) using the standard library.
This does mean we lose the deprecated h2c Upgrade support, but that's
fine.
If/when we do the h2c client support for ExitDNS, we'll have to probe
the peer to see whether it supports it. Or have it reply with a header
saying that future requests can us h2c. (It's tempting to use capver,
but maybe people will disable that support anyway, so we should
discover it at runtime instead.)
Also do the same in the sessionrecording package.
Updates #17305
Change-Id: If323f5ef32486effb18ed836888aa05c0efb701e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Saves 328 KB (2.5%) off the minimal binary.
For IoT devices that don't need MagicDNS (e.g. they don't make
outbound connections), this provides a knob to disable all the DNS
functionality.
Rather than a massive refactor today, this uses constant false values
as a deadcode sledgehammer, guided by shotizam to find the largest DNS
functions which survived deadcode.
A future refactor could make it so that the net/dns/resolver and
publicdns packages don't even show up in the import graph (along with
their imports) but really it's already pretty good looking with just
these consts, so it's not at the top of my list to refactor it more
soon.
Also do the same in a few places with the ACME (cert) functionality,
as I saw those while searching for DNS stuff.
Updates #12614
Change-Id: I8e459f595c2fde68ca16503ff61c8ab339871f97
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
DNS configuration support to ProxyClass, allowing users to customize DNS resolution for Tailscale proxy pods.
Fixes#16886
Signed-off-by: Raj Singh <raj@tailscale.com>
When I added dependency support to featuretag, I broke the handling of
the non-omit build tags (as used by the "box" support for bundling the
CLI into tailscaled). That then affected depaware. The
depaware-minbox.txt this whole time recently has not included the CLI.
So fix that, and also add a new depaware variant that's only the
daemon, without the CLI.
Updates #12614
Updates #17139
Change-Id: I4a4591942aa8c66ad8e3242052e3d9baa42902ca
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Otherwise they're uselessly imported by tsnet applications, even
though they do nothing. tsnet applications wanting to use these
already had to explicitly import them and use kubestore.New or
awsstore.New and assign those to their tsnet.Server.Store fields.
Updates #12614
Change-Id: I358e3923686ddf43a85e6923c3828ba2198991d4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Listen address reuse is allowed as soon as the previous listener is
closed. There is no attempt made to emulate more complex address reuse
logic.
Updates tailscale/corp#28078
Change-Id: I56be1c4848e7b3f9fc97fd4ef13a2de9dcfab0f2
Signed-off-by: Brian Palmer <brianp@tailscale.com>
So wgengine/router is just the docs + entrypoint + types, and then
underscore importing wgengine/router/osrouter registers the constructors
with the wgengine/router package.
Then tsnet can not pull those in.
Updates #17313
Change-Id: If313226f6987d709ea9193c8f16a909326ceefe7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Allow the user to access information about routes an app connector has
learned, such as how many routes for each domain.
Fixestailscale/corp#32624
Signed-off-by: Fran Bull <fran@tailscale.com>
Removes 434 KB from the minimal Linux binary, or ~3%.
Primarily this comes from not linking in the zstd encoding code.
Fixes#17323
Change-Id: I0a90de307dfa1ad7422db7aa8b1b46c782bfaaf7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit modifies the `DNSConfig` custom resource to allow specifying
a replica count when deploying a nameserver. This allows deploying
nameservers in a HA configuration.
Updates https://github.com/tailscale/corp/issues/32589
Signed-off-by: David Bond <davidsbond93@gmail.com>
As of the earlier 85febda86d, our new preferred zstd API of choice
is zstdframe.
Updates #cleanup
Updates tailscale/corp#18514
Change-Id: I5a6164d3162bf2513c3673b6d1e34cfae84cb104
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It has nothing to do with logtail and is confusing named like that.
Updates #cleanup
Updates #17323
Change-Id: Idd34587ba186a2416725f72ffc4c5778b0b9db4a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Now cmd/derper doesn't depend on iptables, nftables, and netlink code :)
But this is really just a cleanup step I noticed on the way to making
tsnet applications able to not link all the OS router code which they
don't use.
Updates #17313
Change-Id: Ic7b4e04e3a9639fd198e9dbeb0f7bae22a4a47a9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This PR cleans up a bunch of things in ./tstest/integration/vms:
- Bumps version of Ubuntu that's actually run from CI 20.04 -> 24.04
- Removes Ubuntu 18.04 test
- Bumps NixOS 21.05 -> 25.05
Updates#cleanup
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The dnstype package is used by tailcfg, which tries to be light and
leafy. But it brings in dnstype. So dnstype shouldn't bring in
x/net/dns/dnsmessage.
Updates #12614
Change-Id: I043637a7ce7fed097e648001f13ca1927a781def
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I noticed this while modularizing clientupdate. With this in first,
moving clientupdate to be modular removes a bunch more stuff from
the minimal build + tsnet.
Updates #17115
Change-Id: I44bd055fca65808633fd3a848b0bbc09b00ad4fa
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
As part of making Tailscale's gvisor dependency optional for small builds,
this was one of the last places left that depended on gvisor. Just copy
the couple functions were were using.
Updates #17283
Change-Id: Id2bc07ba12039afe4c8a3f0b68f4d76d1863bbfe
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Baby steps. This permits building without much of gvisor, but not all of it.
Updates #17283
Change-Id: I8433146e259918cc901fe86b4ea29be22075b32c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This only saves ~32KB in the minimal linux/amd64 binary, but it's a
step towards permitting not depending on gvisor for small builds.
Updates #17283
Change-Id: Iae8da5e9465127de354dbcaf25e794a6832d891b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We can only register one key implementation per process. When running on
macOS or Android, trying to register a separate key implementation from
feature/tpm causes a panic.
Updates #15830
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
On platforms that are causing EPIPE at a high frequency this is
resulting in non-working connections, for example when Apple decides to
forcefully close UDP sockets due to an unsoliced packet rejection in the
firewall.
Too frequent rebinds cause a failure to solicit the endpoints triggering
the rebinds, that would normally happen via CallMeMaybe.
Updates #14551
Updates tailscale/corp#25648
Signed-off-by: James Tucker <james@tailscale.com>
This commit fixes a race condition where `tailscale up --force-reauth` would
exit prematurely on an already-logged in device.
Previously, the CLI would wait for IPN to report the "Running" state and then
exit. However, this could happen before the new auth URL was printed, leading
to two distinct issues:
* **Without seamless key renewal:** The CLI could exit immediately after
the `StartLoginInteractive` call, before IPN has time to switch into
the "Starting" state or send a new auth URL back to the CLI.
* **With seamless key renewal:** IPN stays in the "Running" state
throughout the process, so the CLI exits immediately without performing
any reauthentication.
The fix is to change the CLI's exit condition.
Instead of waiting for the "Running" state, if we're doing a `--force-reauth`
we now wait to see the node key change, which is a more reliable indicator
that a successful authentication has occurred.
Updates tailscale/corp#31476
Updates tailscale/tailscale#17108
Signed-off-by: Alex Chan <alexc@tailscale.com>
This partially reverts f3d2fd2.
When that patch was written, the goroutine that responds to IPN notifications
could call `StartLoginInteractive`, creating a race condition that led to
flaky integration tests. We no longer call `StartLoginInteractive` in that
goroutine, so the race is now impossible.
Moving the `WatchIPNBus` call earlier ensures the CLI gets all necessary
IPN notifications, preventing a reauth from hanging.
Updates tailscale/corp#31476
Signed-off-by: Alex Chan <alexc@tailscale.com>
A customer wants to allow their employees to restart tailscaled at will, when access rights and MDM policy allow it,
as a way to fully reset client state and re-create the tunnel in case of connectivity issues.
On Windows, the main tailscaled process runs as a child of a service process. The service restarts the child
when it exits (or crashes) until the service itself is stopped. Regular (non-admin) users can't stop the service,
and allowing them to do so isn't ideal, especially in managed or multi-user environments.
In this PR, we add a LocalAPI endpoint that instructs ipnserver.Server, and by extension the tailscaled process,
to shut down. The service then restarts the child tailscaled. Shutting down tailscaled requires LocalAPI write access
and an enabled policy setting.
Updates tailscale/corp#32674
Updates tailscale/corp#32675
Signed-off-by: Nick Khyl <nickk@tailscale.com>
And yay: tsnet (and thus k8s-operator etc) no longer depends on
portlist! And LocalBackend is smaller.
Removes 50 KB from the minimal binary.
Updates #12614
Change-Id: Iee04057053dc39305303e8bd1d9599db8368d926
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We made changes to ipnext callback registration/unregistration/invocation in #15780
that made resetting b.exthost to a nil, no-op host in (*LocalBackend).Shutdown() unnecessary.
But resetting it is also racy: b.exthost must be safe for concurrent use with or without b.mu held,
so it shouldn't be written after NewLocalBackend returns. This PR removes it.
Fixes#17279
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This change adds full IPv6 support to the Kubernetes operator's DNS functionality,
enabling dual-stack and IPv6-only cluster support.
Fixes#16633
Signed-off-by: Raj Singh <raj@tailscale.com>
Expand the integration tests to cover a wider range of scenarios, including:
* Before and after a successful initial login
* Auth URLs and auth keys
* With and without the `--force-reauth` flag
* With and without seamless key renewal
These tests expose a race condition when using `--force-reauth` on an
already-logged in device. The command completes too quickly, preventing
the auth URL from being displayed. This issue is identified and will be
fixed in a separate commit.
Updates #17108
Signed-off-by: Alex Chan <alexc@tailscale.com>
Ideally we would remove this warning entirely, as it is now possible to
reauthenticate without losing connectivty. However, it is still possible to
lose SSH connectivity if the user changes the ownership of the machine when
they do a force-reauth, and we have no way of knowing if they are going to
do that before they do it.
For now, let's just reduce the strength of the warning to warn them that
they "may" lose their connection, rather than they "will".
Updates tailscale/corp#32429
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
I think this was originally a brain-o in 9380e2dfc6. It's
disabling the port _poller_, listing what open ports (i.e. services)
are open, not PMP/PCP/UPnP port mapping.
While there, drop in some more testenv.AssertInTest() in a few places.
Updates #cleanup
Change-Id: Ia6f755ad3544f855883b8a7bdcfc066e8649547b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
PR #17258 extracted `derp.Server` into `derp/derpserver.Server`.
This followup patch adds the following cleanups:
1. Rename `derp_server*.go` files to `derpserver*.go` to match
the package name.
2. Rename the `derpserver.NewServer` constructor to `derpserver.New`
to reduce stuttering.
3. Remove the unnecessary `derpserver.Conn` type alias.
Updates #17257
Updates #cleanup
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Sidestep cmd/viewer incompatibility hiccups with
HardwareAttestationPublic type due to its *ecdsa.PublicKey inner member
by serializing the key to a byte slice instead.
Updates tailscale/corp#31269
Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
This exports a number of things from the derp (generic + client) package
to be used by the new derpserver package, as now used by cmd/derper.
And then enough other misc changes to lock in that cmd/tailscaled can
be configured to not bring in tailscale.com/client/local. (The webclient
in particular, even when disabled, was bringing it in, so that's now fixed)
Fixes#17257
Change-Id: I88b6c7958643fb54f386dd900bddf73d2d4d96d5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Some systems need to tell whether the monitored goroutine has finished
alongside other channel operations (notably in this case the relay server, but
there seem likely to be others similarly situated).
Updates #15160
Change-Id: I5f0f3fae827b07f9b7102a3b08f60cda9737fe28
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Help out the linker's dead code elimination.
Updates #12614
Change-Id: I6c13cb44d3250bf1e3a01ad393c637da4613affb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This doesn't yet fully pull it out into a feature/captiveportal package.
This is the usual first step, moving the code to its own files within
the same packages.
Updates #17254
Change-Id: Idfaec839debf7c96f51ca6520ce36ccf2f8eec92
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
updates tailscale/corp#32600
A localAPI/cli call to reload-config can end up leaving magicsock's mutex
locked. We were missing an unlock for the early exit where there's no change in
the static endpoints when the disk-based config is loaded. This is not likely
the root cause of the linked issue - just noted during investigation.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
In MacOS GUI apps, users have to select folders to share via the GUI. This is both because
the GUI app keeps its own record of shares, and because the sandboxed version of the GUI
app needs to gain access to the shared folders by having the user pick them in a file
selector.
The new build tag `ts_mac_gui` allows the MacOS GUI app build to signal that this
is a MacOS GUI app, which causes the `drive` subcommand to be omitted so that people
do not mistakenly attempt to use it.
Updates tailscale/tailscale#17210
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Also update to use the new DisplayNameOrDefault.
Updates tailscale/corp#30456
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
This commit does not change the order or meaning of any eventbus activity, it
only updates the way the plumbing is set up.
Updates #15160
Change-Id: I61b863f9c05459d530a4c34063a8bad9046c0e27
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Add a last seen time on the cli's status command, similar to the web
portal.
Before:
```
100.xxx.xxx.xxx tailscale-operator tagged-devices linux offline
```
After:
```
100.xxx.xxx.xxx tailscale-operator tagged-devices linux offline, last seen 20d ago
```
Fixes#16584
Signed-off-by: Mahyar Mirrashed <mah.mirr@gmail.com>
This commit does not change the order or meaning of any eventbus activity, it
only updates the way the plumbing is set up.
Updates #15160
Change-Id: I06860ac4e43952a9bb4d85366138c9d9a17fd9cd
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
It is a programming error to Publish or Subscribe on a closed Client, but now
the way you discover that is by getting a panic from down in the machinery of
the bus after the client state has been cleaned up.
To provide a more helpful error, let's panic explicitly when that happens and
say what went wrong ("the client is closed"), by preventing subscriptions from
interleaving with closure of the client. With this change, either an attachment
fails outright (because the client is already closed) or completes and then
shuts down in good order in the normal course.
This does not change the semantics of the client, publishers, or subscribers,
it's just making the failure more eager so we can attach explanatory text.
Updates #15160
Change-Id: Ia492f4c1dea7535aec2cdcc2e5ea5410ed5218d2
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Only changes how the go routine consuming the events starts and stops,
not what it does.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This commit modifies the k8s operator to wrap its logger using the logtail
logger provided via the tsnet server. This causes any logs written by
the operator to make their way to Tailscale in the same fashion as
wireguard logs to be used by support.
This functionality can also be opted-out of entirely using the
"TS_NO_LOGS_NO_SUPPORT" environment variable.
Updates https://github.com/tailscale/corp/issues/32037
Signed-off-by: David Bond <davidsbond93@gmail.com>
We never implemented the peercred package on OpenBSD (and I just tried
again and failed), but we've always documented that the creds pointer
can be nil for operating systems where we can't map the unix socket
back to its UID. On those platforms, we set the default unix socket
permissions such that only the admin can open it anyway and we don't
have a read-only vs read-write distinction. OpenBSD was always in that
camp, where any access to Tailscale's unix socket meant full access.
But during some refactoring, we broke OpenBSD in that we started
assuming during one logging path (during login) that Creds was non-nil
when looking up an ipnauth.Actor's username, which wasn't relevant (it
was called from a function "maybeUsernameOf" anyway, which threw away
errors).
Verified on an OpenBSD VM. We don't have any OpenBSD integration tests yet.
Fixes#17209
Updates #17221
Change-Id: I473c5903dfaa645694bcc75e7f5d484f3dd6044d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
controlhttp has the responsibility of dialing a set of candidate control
endpoints in a way that minimizes user facing latency. If one control
endpoint is unavailable we promptly dial another, racing across the
dimensions of: IPv6, IPv4, port 80, and port 443, over multiple server
endpoints.
In the case that the top priority endpoint was not available, the prior
implementation would hang waiting for other results, so as to try to
return the highest priority successful connection to the rest of the
client code. This hang would take too long with a large dialplan and
sufficient client to endpoint latency as to cause the server to timeout
the connection due to inactivity in the intermediate state.
Instead of trying to prioritize non-ideal candidate connections, the
first successful connection is now used unconditionally, improving user
facing latency and avoiding any delays that would encroach on the
server-side timeout.
The tests are converted to memnet and synctest, running on all
platforms.
Fixes#8442Fixestailscale/corp#32534
Co-authored-by: James Tucker <james@tailscale.com>
Change-Id: I4eb57f046d8b40403220e40eb67a31c41adb3a38
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
The controlhttp dialer with a ControlDialPlan IPv6 entry was hitting a
case where the dnscache Resolver was returning an netip.Addr zero
value, where it should've been returning the IPv6 address.
We then tried to dial "invalid IP:80", which would immediately fail,
at least locally.
Mostly this was causing spammy logs when debugging other stuff.
Updates tailscale/corp#32534
Change-Id: If8b9a20f10c1a6aa8a662c324151d987fe9bd2f8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
tsnet apps in particular never use the Linux DNS OSManagers, so they don't need
DBus, etc. I started to pull that all out into separate features so tsnet doesn't
need to bring in DBus, but hit this first.
Here you can see that tsnet (and the k8s-operator) no longer pulls in inotify.
Updates #17206
Change-Id: I7af0f391f60c5e7dbeed7a080346f83262346591
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit does not change the order or meaning of any eventbus activity, it
only updates the way the plumbing is set up.
Updates #15160
Change-Id: I0a175e67e867459daaedba0731bf68bd331e5ebc
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This commit does not change the order or meaning of any eventbus activity, it
only updates the way the plumbing is set up.
Updates #15160
Change-Id: I40c23b183c2a6a6ea3feec7767c8e5417019fc07
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
A common pattern in event bus usage is to run a goroutine to service a
collection of subscribers on a single bus client. To have an orderly shutdown,
however, we need a way to wait for such a goroutine to be finished.
This commit adds a Monitor type that makes this pattern easier to wire up:
rather than having to track all the subscribers and an extra channel, the
component need only track the client and the monitor. For example:
cli := bus.Client("example")
m := cli.Monitor(func(c *eventbus.Client) {
s1 := eventbus.Subscribe[T](cli)
s2 := eventbus.Subscribe[U](cli)
for {
select {
case <-c.Done():
return
case t := <-s1.Events():
processT(t)
case u := <-s2.Events():
processU(u)
}
}
})
To shut down the client and wait for the goroutine, the caller can write:
m.Close()
which closes cli and waits for the goroutine to finish. Or, separately:
cli.Close()
// do other stuff
m.Wait()
While the goroutine management is not explicitly tied to subscriptions, it is a
common enough pattern that this seems like a useful simplification in use.
Updates #15160
Change-Id: I657afda1cfaf03465a9dce1336e9fd518a968bca
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Pulls out the last callback logic and ensures timers are still running.
The eventbustest package is updated support the absence of events.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
And another case of the same typo in a comment elsewhere.
Updates #cleanup
Change-Id: Iaa9d865a1cf83318d4a30263c691451b5d708c9c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* tsnet,internal/client/tailscale: resolve OAuth into authkeys in tsnet
Updates #8403.
* internal/client/tailscale: omit OAuth library via build tag
Updates #12614.
Signed-off-by: Naman Sood <mail@nsood.in>
Expand TestRedactNetmapPrivateKeys to cover all sub-structs of
NetworkMap and confirm that a) all fields are annotated as private or
public, and b) all private fields are getting redacted.
Updates tailscale/corp#32095
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
For debugging purposes, add a new C2N endpoint returning the current
netmap. Optionally, coordination server can send a new "candidate" map
response, which the client will generate a separate netmap for.
Coordination server can later compare two netmaps, detecting unexpected
changes to the client state.
Updates tailscale/corp#32095
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
Instead of a single hard-coded C2N handler, add support for calling
arbitrary C2N endpoints via a node roundtripper.
Updates tailscale/corp#32095
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
When tests run in parallel, events from multiple tests on the same bus can
intercede with each other. This is working as intended, but for the test cases
we want to control exactly what goes through the bus.
To fix that, allocate a fresh bus for each subtest.
Fixes#17197
Change-Id: I53f285ebed8da82e72a2ed136a61884667ef9a5e
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
When developing (and debugging) tests, it is useful to be able to see all the
traffic that transits the event bus during the execution of a test.
Updates #15160
Change-Id: I929aee62ccf13bdd4bd07d786924ce9a74acd17a
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Previously, seamless key renewal was an opt-in feature. Customers had
to set a `seamless-key-renewal` node attribute in their policy file.
This patch enables seamless key renewal by default for all clients.
It includes a `disable-seamless-key-renewal` node attribute we can set
in Control, so we can manage the rollout and disable the feature for
clients with known bugs. This new attribute makes the feature opt-out.
Updates tailscale/corp#31479
Signed-off-by: Alex Chan <alexc@tailscale.com>
This makes the `switch` command use the helper `matchProfile` function
that was introduced in the `remove` sub command.
Signed-off-by: Esteban-Bermudez <esteban@bermudezaguirre.com>
Fixes#12255
Add a new subcommand to `switch` for removing a profile from the local
client. This does not delete the profile from the Tailscale account, but
removes it from the local machine. This functionality is available on
the GUI's, but not yet on the CLI.
Signed-off-by: Esteban-Bermudez <esteban@bermudezaguirre.com>
It doesn't really pull its weight: it adds 577 KB to the binary and
is rarely useful.
Also, we now have static IPs and other connectivity paths coming
soon enough.
Updates #5853
Updates #1278
Updates tailscale/corp#32168
Change-Id: If336fed00a9c9ae9745419e6d81f7de6da6f7275
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For a common case of events being simple struct types with some exported
fields, add a helper to check (reflectively) for equal values using cmp.Diff so
that a failed comparison gives a useful diff in the test output.
More complex uses will still want to provide their own comparisons; this
(intentionally) does not export diff options or other hooks from the cmp
package.
Updates #15160
Change-Id: I86bee1771cad7debd9e3491aa6713afe6fd577a6
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This makes things work slightly better over the eventbus.
Also switches ipnlocal to use the event over the eventbus instead of the
direct callback.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Extend the Expect method of a Watcher to allow filter functions that report
only an error value, and which "pass" when the reported error is nil.
Updates #15160
Change-Id: I582d804554bd1066a9e499c1f3992d068c9e8148
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This fixes a flaky test which has been occasionally timing out in CI.
In particular, this test times out if `watchFile` receives multiple
notifications from inotify before we cancel the test context. We block
processing the second notification, because we've stopped listening to
the `callbackDone` channel.
This patch changes the test so we only send on the first notification.
Testing this locally with `stress` confirms that the test is no longer
flaky.
Fixes#17172
Updates #14699
Signed-off-by: Alex Chan <alexc@tailscale.com>
I'd started to do this in the earlier ts_omit_server PR but
decided to split it into this separate PR.
Updates #17128
Change-Id: Ief8823a78d1f7bbb79e64a5cab30a7d0a5d6ff4b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Instead of waiting for a designated subscription to close as a canary for the
bus being stopped, use the bus Client's own signal for closure added in #17118.
Updates #cleanup
Change-Id: I384ea39f3f1f6a030a6282356f7b5bdcdf8d7102
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
The Tracker was using direct callbacks to ipnlocal. This PR moves those
to be triggered via the eventbus.
Additionally, the eventbus is now closed on exit from tailscaled
explicitly, and health is now a SubSystem in tsd.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Subscribers already have a Done channel that the caller can use to detect when
the subscriber has been closed. Typically this happens when the governing
Client closes, which in turn is typically because the Bus closed.
But clients and subscribers can stop at other times too, and a caller has no
good way to tell the difference between "this subscriber closed but the rest
are OK" and "the client closed and all these subscribers are finished".
We've worked around this in practice by knowing the closure of one subscriber
implies the fate of the rest, but we can do better: Add a Done method to the
Client that allows us to tell when that has been closed explicitly, after all
the publishers and subscribers associated with that client have been closed.
This allows the caller to be sure that, by the time that occurs, no further
pending events are forthcoming on that client.
Updates #15160
Change-Id: Id601a79ba043365ecdb47dd035f1fdadd984f303
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Remove the need for the caller to hold on to and call an unregister
function. Both two callers (one real, one test) already have a context
they can use. Use context.AfterFunc instead. There are no observable
side effects from scheduling too late if the goroutine doesn't run sync.
Updates #17148
Change-Id: Ie697dae0e797494fa8ef27fbafa193bfe5ceb307
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This test ostensibly checks whether we record an error metric if a packet
is dropped because the network is down, but the network connectivity is
irrelevant -- the send error is actually because the arguments to Send()
are invalid:
RebindingUDPConn.WriteWireGuardBatchTo:
[unexpected] offset (0) != Geneve header length (8)
This patch changes the test so we try to send a valid packet, and we
verify this by sending it once before taking the network down. The new
error is:
magicsock: network down
which is what we're trying to test.
We then test sending an invalid payload as a separate test case.
Updates tailscale/corp#22075
Signed-off-by: Alex Chan <alexc@tailscale.com>
endpointState is used for tracking UDP direct connection candidate
addresses. If it contains a DERP addr, then direct connection path
discovery will always send a wasteful disco ping over it. Additionally,
CLI "tailscale ping" via peer relay will race over DERP, leading to a
misleading result if pong arrives via DERP first.
Disco pongs arriving via DERP never influence path selection. Disco
ping/pong via DERP only serves "tailscale ping" reporting.
Updates #17121
Signed-off-by: Jordan Whited <jordan@tailscale.com>
When you say --features=foo,bar, that was supposed to mean
to only show features "foo" and "bar" in the table.
But it was also being used as the set of all features that are
omittable, which was wrong, leading to misleading numbers
when --features was non-empty.
Updates #12614
Change-Id: Idad2fa67fb49c39454032e84a3dede967890fdf5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The Unix implementation of doExec propagates error codes by virtue of
the fact that it does an execve; the replacement binary will return the
exit code.
On non-Unix, we need to simulate these semantics by checking for an
ExitError and, when present, passing that value on to os.Exit.
We also add error handling to the doExec call for the benefit of
handling any errors where doExec fails before being able to execute
the desired binary.
Updates https://github.com/tailscale/corp/issues/29940
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
This renames the package+symbols in the earlier 17ffa80138 to be
in their own package ("buildfeatures") and start with the word "Has"
like "if buildfeatures.HasFoo {".
Updates #12614
Change-Id: I510e5f65993e5b76a0e163e3aa4543755213cbf6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Extend the client state management to generate a hardware attestation
key if none exists.
Extend MapRequest with HardwareAttestationKey{,Signature} fields that
optionally contain the public component of the hardware attestation key
and a signature of the node's node key using it. This will be used by
control to associate hardware attesation keys with node identities on a
TOFU basis.
Updates tailscale/corp#31269
Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
So code (in upcoming PRs) can test for the build tags with consts and
get dead code elimination from the compiler+linker.
Updates #12614
Change-Id: If6160453ffd01b798f09894141e7631a93385941
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is a small introduction of the eventbus into controlclient that
communicates with mainly ipnlocal. While ipnlocal is a complicated part
of the codebase, the subscribers here are from the perspective of
ipnlocal already called async.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This commit fixes an issue within the service reconciler where we end
up in a constant reconciliation loop. When reconciling, the loadbalancer
status is appended to but not reset between each reconciliation, leading
to an ever growing slice of duplicate statuses.
Fixes https://github.com/tailscale/tailscale/issues/17105
Fixes https://github.com/tailscale/tailscale/issues/17107
Signed-off-by: David Bond <davidsbond93@gmail.com>
This commit adds a new method to the tsnet.Server type named `Logger`
that returns the underlying logtail instance's Logf method.
This is intended to be used within the Kubernetes operator to wrap its
existing logger in a way such that operator specific logs can also be
sent to control for support & debugging purposes.
Updates https://github.com/tailscale/corp/issues/32037
Signed-off-by: David Bond <davidsbond93@gmail.com>
As of this commit (per the issue), the Taildrive code remains where it
was, but in new files that are protected by the new ts_omit_drive
build tag. Future commits will move it.
Updates #17058
Change-Id: Idf0a51db59e41ae8da6ea2b11d238aefc48b219e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Its doc said its signature matched a std signature, but it used
Tailscale-specific types.
Nowadays it's the caller (func control) that curries the logf/netmon
and returns the std-matching signature.
Updates #cleanup (while answering a question on Slack)
Change-Id: Ic99de41fc6a1c720575a7f33c564d0bcfd9a2c30
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To support integration testing of client features that rely on it, e.g.
peer relay.
Updates tailscale/corp#30903
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Removes ACL edits from e2e tests in favour of trying to simplify the
tests and separate the actual test logic from the environment setup
logic as much as possible. Also aims to fit in with the requirements
that will generally be filled anyway for most devs working on the
operator; in particular using tags that fit in with our documentation.
Updates tailscale/corp#32085
Change-Id: I7659246e39ec0b7bcc4ec0a00c6310f25fe6fac2
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This adds a file that's not compiled by default that exists just to
make it easier to do binary size checks, probing what a binary would
be like if it included reflect methods (as used by html/template, etc).
As an example, once tailscaled uses reflect.Type.MethodByName(non-const-string) anywhere,
the build jumps up by 14.5 MB:
$ GOOS=linux GOARCH=amd64 ./tool/go build -tags=ts_include_cli,ts_omit_webclient,ts_omit_systray,ts_omit_debugeventbus -o before ./cmd/tailscaled
$ GOOS=linux GOARCH=amd64 ./tool/go build -tags=ts_include_cli,ts_omit_webclient,ts_omit_systray,ts_omit_debugeventbus,ts_debug_forcereflect -o after ./cmd/tailscaled
$ ls -l before after
-rwxr-xr-x@ 1 bradfitz staff 41011861 Sep 9 07:28 before
-rwxr-xr-x@ 1 bradfitz staff 55610948 Sep 9 07:29 after
This is particularly pronounced with large deps like the AWS SDK. If you compare using ts_omit_aws:
-rwxr-xr-x@ 1 bradfitz staff 38284771 Sep 9 07:40 no-aws-no-reflect
-rwxr-xr-x@ 1 bradfitz staff 45546491 Sep 9 07:41 no-aws-with-reflect
That means adding AWS to a non-reflect binary adds 2.7 MB but adding
AWS to a reflect binary adds 10 MB.
Updates #17063
Updates #12614
Change-Id: I18e9b77c9cf33565ce5bba65ac5584fa9433f7fb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* cmd/tailscale/cli: use client/local instead of deprecated client/tailscale
Updates tailscale/corp#22748
Signed-off-by: Alex Chan <alexc@tailscale.com>
* derp: use client/local instead of deprecated client/tailscale
Updates tailscale/corp#22748
Signed-off-by: Alex Chan <alexc@tailscale.com>
---------
Signed-off-by: Alex Chan <alexc@tailscale.com>
I probably could've deflaked this without synctest, but might as well use
it now that Go 1.25 has it.
Fixes#15348
Change-Id: I81c9253fcb7eada079f3e943ab5f1e29ba8e8e31
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* utils/expvarx: mark TestSafeFuncHappyPath as known flaky
Updates #15348
Signed-off-by: Alex Chan <alexc@tailscale.com>
* tstest/integration: mark TestCollectPanic as known flaky
Updates #15865
Signed-off-by: Alex Chan <alexc@tailscale.com>
---------
Signed-off-by: Alex Chan <alexc@tailscale.com>
It was a bit confusing that provided history did not include the
current probe results.
Updates tailscale/corp#20583
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
We should never use the real syspolicy implementation in tests by
default. (the machine's configuration shouldn't affect tests)
You either specify a test policy, or you get a no-op one.
Updates #16998
Change-Id: I3350d392aad11573a5ad7caab919bb3bbaecb225
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit modifies containerboot's state reset process to handle the
state secret not existing. During other parts of the boot process we
gracefully handle the state secret not being created yet, but missed
that check within `resetContainerbootState`
Fixes https://github.com/tailscale/tailscale/issues/16804
Signed-off-by: David Bond <davidsbond93@gmail.com>
Fix "file not found" errors when WebDAV clients access files/dirs inside
directories with spaces.
The issue occurred because StatCache was mixing URL-escaped and
unescaped paths, causing cache key mismatches.
Specifically, StatCache.set() parsed WebDAV responses containing
URL-escaped paths (ex. "Dir%20Space/file1.txt") and stored them
alongside unescaped cache keys (ex. "Dir Space/file1.txt").
This mismatch prevented StatCache.get() from correctly determining whether
a child file existed.
See https://github.com/tailscale/tailscale/issues/13632#issuecomment-3243522449
for the full explanation of the issue.
The decision to keep all paths references unescaped inside the StatCache
is consistent with net/http.Request.URL.Path and rewrite.go (sole consumer)
Update unit test to detect this directory space mishandling.
Fixes tailscale#13632
Signed-off-by: Craig Hesling <craig@hesling.com>
There's a TODO to delete all of handler.go, but part of it's
still used in another repo.
But this deletes some.
Updates #17022
Change-Id: Ic5a8a5a694ca258440307436731cd92b45ee2d21
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Before:
$ tailscale ip -4
1.2.3.4
$ tailscale set --exit-node=1.2.3.4
no node found in netmap with IP 1.2.3.4
After:
$ tailscale set --exit-node=1.2.3.4
cannot use 1.2.3.4 as an exit node as it is a local IP address to this machine; did you mean --advertise-exit-node?
The new error message already existed in the code, but would only be
triggered if the backend wasn't running -- which means, in practice,
it would almost never be triggered.
The old error message is technically true, but could be confusing if you
don't know the distinction between "netmap" and "tailnet" -- it could
sound like the exit node isn't part of your tailnet. A node is never in
its own netmap, but it is part of your tailnet.
This error confused me when I was doing some local dev work, and it's
confused customers before (e.g. #7513). Using the more specific error
message should reduce confusion.
Updates #7513
Updates https://github.com/tailscale/corp/issues/23596
Signed-off-by: Alex Chan <alexc@tailscale.com>
Now that we have policytest and the policyclient.Client interface, we
can de-global-ify many of the tests, letting them run concurrently
with each other, and just removing global variable complexity.
This does ~half of the LocalBackend ones.
Updates #16998
Change-Id: Iece754e1ef4e49744ccd967fa83629d0dca6f66a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is step 4 of making syspolicy a build-time feature.
This adds a policyclient.Get() accessor to return the correct
implementation to use: either the real one, or the no-op one. (A third
type, a static one for testing, also exists, so in general a
policyclient.Client should be plumbed around and not always fetched
via policyclient.Get whenever possible, especially if tests need to use
alternate syspolicy)
Updates #16998
Updates #12614
Change-Id: Iaf19670744a596d5918acfa744f5db4564272978
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Step 4 of N. See earlier commits in the series (via the issue) for the
plan.
This adds the missing methods to policyclient.Client and then uses it
everywhere in ipn/ipnlocal and locks it in with a new dep test.
Still plenty of users of the global syspolicy elsewhere in the tree,
but this is a lot of them.
Updates #16998
Updates #12614
Change-Id: I25b136539ae1eedbcba80124de842970db0ca314
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Step 3 in the series. See earlier cc532efc20 and d05e6dc09e.
This step moves some types into a new leaf "ptype" package out of the
big "settings" package. The policyclient.Client will later get new
methods to return those things (as well as Duration and Uint64, which
weren't done at the time of the earlier prototype).
Updates #16998
Updates #12614
Change-Id: I4d72d8079de3b5351ed602eaa72863372bd474a2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously, when attempting a risky action, the CLI printed a 5 second countdown saying
"Continuing in 5 seconds...". When the countdown finished, the CLI aborted rather than
continuing.
To avoid confusion, but also avoid accidentally continuing if someone (or an automated
process) fails to manually abort within the countdown, we now explicitly prompt for a
y/n response on whether or not to continue.
Updates #15445
Co-authored-by: Kot C <kot@kot.pink>
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This commit adds a `replicas` field to the `Connector` custom resource that
allows users to specify the number of desired replicas deployed for their
connectors.
This allows users to deploy exit nodes, subnet routers and app connectors
in a highly available fashion.
Fixes#14020
Signed-off-by: David Bond <davidsbond93@gmail.com>
This is step 2 of ~4, breaking up #14720 into reviewable chunks, with
the aim to make syspolicy be a build-time configurable feature.
Step 1 was #16984.
In this second step, the util/syspolicy/policyclient package is added
with the policyclient.Client interface. This is the interface that's
always present (regardless of build tags), and is what code around the
tree uses to ask syspolicy/MDM questions.
There are two implementations of policyclient.Client for now:
1) NoPolicyClient, which only returns default values.
2) the unexported, temporary 'globalSyspolicy', which is implemented
in terms of the global functions we wish to later eliminate.
This then starts to plumb around the policyclient.Client to most callers.
Future changes will plumb it more. When the last of the global func
callers are gone, then we can unexport the global functions and make a
proper policyclient.Client type and constructor in the syspolicy
package, removing the globalSyspolicy impl out of tsd.
The final change will sprinkle build tags in a few more places and
lock it in with dependency tests to make sure the dependencies don't
later creep back in.
Updates #16998
Updates #12614
Change-Id: Ib2c93d15c15c1f2b981464099177cd492d50391c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is step 1 of ~3, breaking up #14720 into reviewable chunks, with
the aim to make syspolicy be a build-time configurable feature.
In this first (very noisy) step, all the syspolicy string key
constants move to a new constant-only (code-free) package. This will
make future steps more reviewable, without this movement noise.
There are no code or behavior changes here.
The future steps of this series can be seen in #14720: removing global
funcs from syspolicy resolution and using an interface that's plumbed
around instead. Then adding build tags.
Updates #12614
Change-Id: If73bf2c28b9c9b1a408fe868b0b6a25b03eeabd1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Apparently, #16989 introduced a bug in request-dataplane-review.yml:
> you may only define one of `paths` and `paths-ignore` for a single event
Related #16372
Updates #cleanup
Signed-off-by: Simon Law <sfllaw@tailscale.com>
@tailscale/dataplane almost never needs to review depaware.txt, when
it is the only change to the DERP implementation.
Related #16372
Updates #cleanup
Signed-off-by: Simon Law <sfllaw@tailscale.com>
If the DERP queue is full, drop the oldest item first, rather than the
youngest, on the assumption that older data is more likely to be
unanswerable.
Updates tailscale/corp#31762
Signed-off-by: James Tucker <james@tailscale.com>
Add a ternary flag that unless set explicitly to false keeps the
insecure behavior of TSIDP.
If the flag is false, add functionality on startup to migrate
oidc-funnel-clients.json to oauth-clients.json if it doesn’t exist.
If the flag is false, modify endpoints to behave similarly regardless
of funnel, tailnet, or localhost. They will all verify client ID & secret
when appropriate per RFC 6749. The authorize endpoint will no longer change
based on funnel status or nodeID.
Add extra tests verifying TSIDP endpoints behave as expected
with the new flag.
Safely create the redirect URL from what's passed into the
authorize endpoint.
Fixes #16880
Signed-off-by: Remy Guercio <remy@tailscale.com>
Doesn't look to affect us, but pacifies security scanners.
See 88ddf1d0d9
It's for decoding. We only use this package for encoding (via
github.com/google/rpmpack / github.com/goreleaser/nfpm/v2).
Updates #8043
Change-Id: I87631aa5048f9514bb83baf1424f6abb34329c46
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Our own WaitGroup wrapper type was a prototype implementation
for the Go method on the standard sync.WaitGroup type.
Now that there is first-class support for Go,
we should migrate over to using it and delete syncs.WaitGroup.
Updates #cleanup
Updates tailscale/tailscale#16330
Change-Id: Ib52b10f9847341ce29b4ca0da927dc9321691235
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
DERP writes go via TCP and the host OS will have plenty of buffer space.
We've observed in the wild with a backed up TCP socket kernel side
buffers of >2.4MB. The DERP internal queue being larger causes an
increase in the probability that the contents of the backbuffer are
"dead letters" - packets that were assumed to be lost.
A first step to improvement is to size this queue only large enough to
avoid some of the initial connect stall problem, but not large enough
that it is contributing in a substantial way to buffer bloat /
dead-letter retention.
Updates tailscale/corp#31762
Signed-off-by: James Tucker <james@tailscale.com>
I need a ringbuffer in the more traditional sense, one that has a notion
of item removal as well as tail loss on overrun. This implementation is
really a clearable log window, and is used as such where it is used.
Updates #cleanup
Updates tailscale/corp#31762
Signed-off-by: James Tucker <james@tailscale.com>
Bump Go 1.25 release to include a go/types patch and resolve govulncheck
CI exceptions.
Updates tailscale/corp#31755
Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
Extract field comments from AST and include them in generated view
methods. Comments are preserved from the original struct fields to
provide documentation for the view accessors.
Fixes#16958
Signed-off-by: Maisem Ali <3953239+maisem@users.noreply.github.com>
fixestailscale/corp#26369
The suggested exit node is currently only calculated during a localAPI request.
For older UIs, this wasn't a bad choice - we could just fetch it on-demand when a menu
presented itself. For newer incarnations however, this is an always-visible field
that needs to react to changes in the suggested exit node's value.
This change recalculates the suggested exit node ID on netmap updates and
broadcasts it on the IPN bus. The localAPI version of this remains intact for the
time being.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
updates tailscale/corp#29841
Adds a node cap macOS UIs can query to determine
whether then should enable the new windowed UI.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
Pull the lock-bearing code into a closure, and use a clone rather than a
shallow copy of the hostinfo record.
Updates #11649
Change-Id: I4f1d42c42ce45e493b204baae0d50b1cbf82b102
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
The early unlock on this branch was required because the "send" method goes on
to acquire the mutex itself. Rather than release the lock just to acquire it
again, call the underlying locked helper directly.
Updates #11649
Change-Id: I50d81864a00150fc41460b7486a9c65655f282f5
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
In places where we are locking the LocakBackend and immediately deferring an
unlock, and where there is no shortcut path in the control flow below the
deferral, we do not need the unlockOnce helper. Replace all these with use of
the lock directly.
Updates #11649
Change-Id: I3e6a7110dfc9ec6c1d38d2585c5367a0d4e76514
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Instead of referring to groups, which is a term of art for a different entity,
update the doc comments to more accurately describe what tags are in reference
to the policy document.
Updates #cleanup
Change-Id: Iefff6f84981985f834bae7c6a6c34044f53f2ea2
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
There are several methods within the LocalBackend that used an unusual and
error-prone lock discipline whereby they require the caller to hold the backend
mutex on entry, but release it on the way out.
In #11650 we added some support code to make this pattern more visible.
Now it is time to eliminate the pattern (at least within this package).
This is intended to produce no semantic changes, though I am relying on
integration tests and careful inspection to achieve that.
To the extent possible I preserved the existing control flow. In a few places,
however, I replaced this with an unlock/lock closure. This means we will
sometimes reacquire a lock only to release it again one frame up the stack, but
these operations are not performance sensitive and the legibility gain seems
worthwhile.
We can probably also pull some of these out into separate methods, but I did
not do that here so as to avoid other variable scope changes that might be hard
to see. I would like to do some more cleanup separately.
As a follow-up, we could also remove the unlockOnce helper, but I did not do
that here either.
Updates #11649
Change-Id: I4c92d4536eca629cfcd6187528381c33f4d64e20
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
The serve code leaves it up to the system's DNS resolver and netstack to
figure out how to reach the proxy destination. Combined with k8s-proxy
running in userspace mode, this means we can't rely on MagicDNS being
available or tailnet IPs being routable. I'd like to implement that as a
feature for serve in userspace mode, but for now the safer fix to get
kube-apiserver ProxyGroups consistently working in all environments is to
switch to using localhost as the proxy target instead.
This has a small knock-on in the code that does WhoIs lookups, which now
needs to check the X-Forwarded-For header that serve populates to get
the correct tailnet IP to look up, because the request's remote address
will be loopback.
Fixes#16920
Change-Id: I869ddcaf93102da50e66071bb00114cc1acc1288
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This increases throughput over long fat networks, and in the presence
of crypto/syscall-induced delay.
Updates tailscale/corp#31164
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Update odic-funnel-clients.json to take a path, this
allows setting the location of the file and prevents
it from landing in the root directory or users home directory.
Move setting of rootPath until after tsnet has started.
Previously this was added for the lazy creation of the
oidc-key.json. It's now needed earlier in the flow.
Updates #16734Fixes#16844
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Add the ability for operators of natc in consensus mode to remove
servers from the raft cluster config, without losing other state.
Updates #14667
Signed-off-by: Fran Bull <fran@tailscale.com>
Currently consensus has a bootstrap routine where a tsnet node tries to
join each other node with the cluster tag, and if it is not able to join
any other node it starts its own cluster.
That algorithm is racy, and can result in split brain (more than one
leader/cluster) if all the nodes for a cluster are started at the same
time.
Add a FollowOnly argument to the bootstrap function. If provided this
tsnet node will never lead, it will try (and retry with exponential back
off) to follow any node it can contact.
Add a --follow-only flag to cmd/natc that uses this new tsconsensus
functionality.
Also slightly reorganize some arguments into opts structs.
Updates #14667
Signed-off-by: Fran Bull <fran@tailscale.com>
This significantly improves throughput of a peer relay server on Linux.
Server.packetReadLoop no longer passes sockets down the stack. Instead,
packet handling methods return a netip.AddrPort and []byte, which
packetReadLoop gathers together for eventual batched writes on the
appropriate socket(s).
Updates tailscale/corp#31164
Signed-off-by: Jordan Whited <jordan@tailscale.com>
We have been unintentionally ignoring errors from calling bootstrap.
bootstrap sometimes calls raft.BootstrapCluster which sometimes returns
a safe to ignore error, handle that case appropriately.
Updates #14667
Signed-off-by: Fran Bull <fran@tailscale.com>
This has come up in a few situations recently and adding these helpers
is much better than copying the slice (calling AsSlice()) in order to
use slices.Max and friends.
Updates #cleanup
Change-Id: Ib289a07d23c3687220c72c4ce341b9695cd875bf
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
Update the runall handler to be more generic with an
exclude param to exclude multiple probes as the requesters
definition.
Updates tailscale/corp#27370
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Cleanup nix support, make flake easier to read with nix-systems.
This also harmonizes with golinks flake setup and reduces an input
dependency by 1.
Update deps test to ensure the vendor hash stays harmonized
with go.mod.
Update make tidy to ensure vendor hash stays current.
Overlay the current version of golang, tailscale runs
recent releases faster than nixpkgs can update them into
the unstable branch.
Updates #16637
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
The -Environment argument to Start-Process is essentially being treated
as a delta; removing a particular variable from the argument's hash
table does not indicate to delete. Instead we must set the value of each
unwanted variable to $null.
Updates https://github.com/tailscale/corp/issues/29940
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Some of the operations of the local API need an event bus to correctly
instantiate other components (notably including the portmapper).
This commit adds that, and as the parameter list is starting to get a bit long
and hard to read, I took the opportunity to move the arguments to a config
type. Only a few call sites needed to be updated and this API is not intended
for general use, so I did not bother to stage the change.
Updates #15160
Updates #16842
Change-Id: I7b057d71161bd859f5acb96e2f878a34c85be0ef
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
gocross-wrapper.ps1 is a PowerShell core script that is essentially a
straight port of gocross-wrapper.sh. It requires PowerShell 7.4, which
is the latest LTS release of PSCore.
Why use PowerShell Core instead of Windows PowerShell? Essentially
because the former is much better to script with and is the edition
that is currently maintained.
Because we're using PowerShell Core, but many people will be running
scripts from a machine that only has Windows PowerShell, go.cmd has
been updated to prompt the user for PowerShell core installation if
necessary.
gocross-wrapper.sh has also been updated to utilize the PSCore script
when running under cygwin or msys.
gocross itself required a couple of updates:
We update gocross to output the PowerShell Core wrapper alongside the
bash wrapper, which will propagate the revised scripts to other repos
as necessary.
We also fix a couple of things in gocross that didn't work on Windows:
we change the toolchain resolution code to use os.UserHomeDir instead
of directly referencing the HOME environment variable, and we fix a
bug in the way arguments were being passed into exec.Command on
non-Unix systems.
Updates https://github.com/tailscale/corp/issues/29940
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Add a Run all probes handler that executes all
probes except those that are continuous or the derpmap
probe.
This is leveraged by other tooling to confirm DERP
stability after a deploy.
Updates tailscale/corp#27370
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
fixestailscale/corp#31299
Fixes two issues:
getInterfaceIndex would occasionally race with netmon's state, returning
the cached default interface index after it had be changed by NWNetworkMonitor.
This had the potential to cause connections to bind to the prior default. The fix
here is to preferentially use the interface index provided by NWNetworkMonitor
preferentially.
When no interfaces are available, macOS will set the tunnel as the default
interface when an exit node is enabled, potentially causing getInterfaceIndex
to return utun's index. We now guard against this when taking the
defaultIdx path.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
This pulls in a change from github.com/tailscale/QDK to verify code signing
when using QNAP_SIGNING_SCRIPT.
It also upgrades to the latest Google Cloud PKCS#11 library, and reorders
the Dockerfile to allow for more efficient future upgrades to the included QDK.
Updates tailscale/corp#23528
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Define the HardwareAttestionKey interface describing a platform-specific
hardware backed node identity attestation key. Clients will register the
key type implementations for their platform.
Updates tailscale/corp#31269
Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
dnstype.Resolver adds a boolean UseWithExitNode that controls
whether the resolver should be used in tailscale exit node contexts
(not wireguard exit nodes). If UseWithExitNode resolvers are found,
they are installed as the global resolvers. If no UseWithExitNode resolvers
are found, the exit node resolver continues to be installed as the global
resolver. Split DNS Routes referencing UseWithExitNode resolvers are also
installed.
Updates #8237Fixestailscale/corp#30906Fixestailscale/corp#30907
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
We already show a message in the menu itself, this just adds it to the
CLI output as well.
Updates #1708
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
This adds support for having every viewer type implement
jsonv2.MarshalerTo and jsonv2.UnmarshalerFrom.
This provides a significant boost in performance
as the json package no longer needs to validate
the entirety of the JSON value outputted by MarshalJSON,
nor does it need to identify the boundaries of a JSON value
in order to call UnmarshalJSON.
For deeply nested and recursive MarshalJSON or UnmarshalJSON calls,
this can improve runtime from O(N²) to O(N).
This still references "github.com/go-json-experiment/json"
instead of the experimental "encoding/json/v2" package
now available in Go 1.25 under goexperiment.jsonv2
so that code still builds without the experiment tag.
Of note, the "github.com/go-json-experiment/json" package
aliases the standard library under the right build conditions.
Updates tailscale/corp#791
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Adds a setter for proxyFunc to allow macOS to pull defined
system proxies. Disallows overriding if proxyFunc is set via config.
Updates tailscale/corp#30668
Signed-off-by: Will Hannah <willh@tailscale.com>
This affects the 1.87.33 unstable release.
Updates #16842
Updates #15160
Change-Id: Ie6d1b2c094d1a6059fbd1023760567900f06e0ad
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Expected when Peer Relay'ing via self. These disco messages never get
sealed, and never leave the process.
Updates tailscale/corp#30527
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Update some logging to help future failures.
Improve test shutdown concurrency issues.
Fixes#16722
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Peer Relay is dependent on crypto routing, therefore crypto routing is
now mandatory.
Updates tailscale/corp#20732
Updates tailscale/corp#31083
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit also extends the updateRelayServersSet unit tests to cover
onNodeViewsUpdate.
Fixestailscale/corp#31080
Signed-off-by: Jordan Whited <jordan@tailscale.com>
One of these tests highlighted a Geneve encap bug, which is also fixed
in this commit.
looksLikeInitMsg was passed a packet post Geneve header stripping with
slice offsets that had not been updated to account for the stripping.
Updates tailscale/corp#30903
Signed-off-by: Jordan Whited <jordan@tailscale.com>
* Update installer.sh add FreeBSD ver 15
this should fix the issue on https://github.com/tailscale/tailscale/issues/16740
Signed-off-by: TheBigBear <471105+TheBigBear@users.noreply.github.com>
* scripts/installer.sh: small indentation change
Signed-off-by: Erisa A <erisa@tailscale.com>
Fixes#16740
---------
Signed-off-by: TheBigBear <471105+TheBigBear@users.noreply.github.com>
Signed-off-by: Erisa A <erisa@tailscale.com>
Co-authored-by: Erisa A <erisa@tailscale.com>
Pass a local.Client to systray.Run, so we can use the existing global
localClient in the cmd/tailscale CLI. Add socket flag to cmd/systray.
Updates #1708
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
Adds the eventbus to the router subsystem.
The event is currently only used on linux.
Also includes facilities to inject events into the bus.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This will start including the sytray app in unstable builds for Linux,
unless the `ts_omit_systray` build flag is specified.
If we decide not to include it in the v1.88 release, we can pull it
back out or restrict it to unstable builds.
Updates #1708
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
In Android, we are prompting the user to select a Taildrop directory when they first receive a Taildrop: we block writes on Taildrop dir selection. This means that we cannot use Dir inside managerOptions, since the http request would not get the new Taildrop extension. This PR removes, in the Android case, the reliance on m.opts.Dir, and instead has FileOps hold the correct directory.
This expands FileOps to be the Taildrop interface for all file system operations.
Updates tailscale/corp#29211
Signed-off-by: kari-ts <kari@tailscale.com>
restore tstest
* cmd/k8s-operator,k8s-operator: allow setting a `priorityClassName`
Fixes#16682
Signed-off-by: Lee Briggs <lee@leebriggs.co.uk>
* Update k8s-operator/apis/v1alpha1/types_proxyclass.go
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
Signed-off-by: Lee Briggs <jaxxstorm@users.noreply.github.com>
* run make kube-generate-all
Change-Id: I5f8f16694fdc181b048217b9f05ec2ee2aa04def
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
---------
Signed-off-by: Lee Briggs <lee@leebriggs.co.uk>
Signed-off-by: Lee Briggs <jaxxstorm@users.noreply.github.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
The tsidp oidc-key.json ended up in the root directory
or home dir of the user process running it.
Update this to store it in a known location respecting
the TS_STATE_DIR and flagDir options.
Fixes#16734
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Also adds a test to kube/kubeclient to defend against the error type
returned by the client changing in future.
Fixestailscale/corp#30855
Change-Id: Id11d4295003e66ad5c29a687f1239333c21226a4
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Some systems have `sudo`, some have `su`. This tries both, increasing
the chance that we can run the file server as an unprivileged user.
Updates #14629
Signed-off-by: Percy Wegmann <percy@tailscale.com>
If a conn.Close call raced conn.ReadFromUDPAddrPort before it could
"register" itself as an active read, the conn.ReadFromUDPAddrPort would
never return.
This commit replaces all the activeRead and breakActiveReads machinery
with a channel. These constructs were only depended upon by
SetReadDeadline, and SetReadDeadline was unused.
Updates #16707
Signed-off-by: Jordan Whited <jordan@tailscale.com>
This commit update the message for recommanding clear command after running serve for service.
Instead of a flag, we pass the service name as a parameter.
Fixestailscale/corp#30846
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
In the components where an event bus is already plumbed through, remove the
exceptions that allow it to be omitted, and update all the tests that relied on
those workarounds execute properly.
This change applies only to the places where we're already using the bus; it
does not enforce the existence of a bus in other components (yet),
Updates #15160
Change-Id: Iebb92243caba82b5eb420c49fc3e089a77454f65
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
jsonv2 now returns an error when you marshal or unmarshal a time.Duration
without an explicit format flag. This is an intentional, temporary choice until
the default [time.Duration] representation is decided (see golang/go#71631).
setting.Snapshot can hold time.Duration values inside a map[string]any,
so the jsonv2 update breaks marshaling. In this PR, we start using
a custom marshaler until that decision is made or golang/go#71664
lets us specify the format explicitly.
This fixes `tailscale syspolicy list` failing when KeyExpirationNotice
or any other time.Duration policy setting is configured.
Fixes#16683
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Ideally when we attempt to create a new port mapping, we should not return
without error when no mapping is available. We already log these cases as
unexpected, so this change is just to avoiding panicking dispatch on the
invalid result in those cases. We still separately need to fix the underlying
control flow.
Updates #16662
Change-Id: I51e8a116b922b49eda45e31cd27f6b89dd51abc8
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This occasionally panics waiting on a nil ctx, but was missed in the
previous PR because it's quite a rare flake as it needs to progress to a
specific point in the parser.
Updates #16678
Change-Id: Ifd36dfc915b153aede36b8ee39eff83750031f95
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
When kubectl starts an interactive attach session, it sends 2 resize
messages in quick succession. It seems that particularly in HTTP mode,
we often receive both of these WebSocket frames from the underlying
connection in a single read. However, our parser currently assumes 0-1
frames per read, and leaves the second frame in the read buffer until
the next read from the underlying connection. It doesn't take long after
that before we end up failing to skip a control message as we normally
should, and then we parse a control message as though it will have a
stream ID (part of the Kubernetes protocol) and error out.
Instead, we should keep parsing frames from the read buffer for as long
as we're able to parse complete frames, so this commit refactors the
messages parsing logic into a loop based on the contents of the read
buffer being non-empty.
k/k staging/src/k8s.io/kubectl/pkg/cmd/attach/attach.go for full
details of the resize messages.
There are at least a couple more multiple-frame read edge cases we
should handle, but this commit is very conservatively fixing a single
observed issue to make it a low-risk candidate for cherry picking.
Updates #13358
Change-Id: Iafb91ad1cbeed9c5231a1525d4563164fc1f002f
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This update introduces support for DNS records associated with ProxyGroup egress services, ensuring that the ClusterIP Service IP is used instead of Pod IPs.
Fixes#15945
Signed-off-by: Raj Singh <raj@tailscale.com>
We as members, contributors, and leaders pledge to make participation
We are committed to creating an open, welcoming, diverse, inclusive, healthy and respectful community.
in our community a harassment-free experience for everyone, regardless
Unacceptable, harmful and inappropriate behavior will not be tolerated.
of age, body size, visible or invisible disability, ethnicity, sex
characteristics, gender identity and expression, level of experience,
education, socio-economic status, nationality, personal appearance,
race, religion, or sexual identity and orientation.
We pledge to act and interact in ways that contribute to an open,
welcoming, diverse, inclusive, and healthy community.
## Our Standards
## Our Standards
Examples of behavior that contributes to a positive environment for
Examples of behavior that contributes to a positive environment for our community include:
our community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our
mistakes, and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
- Demonstrating empathy and kindness toward other people.
advances of any kind
- Being respectful of differing opinions, viewpoints, and experiences.
* Trolling, insulting or derogatory comments, and personal or
- Giving and gracefully accepting constructive feedback.
political attacks
- Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience.
* Public or private harassment
- Focusing on what is best not just for us as individuals, but for the overall community.
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in
a professional setting
## Enforcement Responsibilities
Examples of unacceptable behavior include without limitation:
Community leaders are responsible for clarifying and enforcing our
- The use of language, imagery or emojis (collectively "content") that is racist, sexist, homophobic, transphobic, or otherwise harassing or discriminatory based on any protected characteristic.
standards of acceptable behavior and will take appropriate and fair
- The use of sexualized content and sexual attention or advances of any kind.
corrective action in response to any behavior that they deem
- The use of violent, intimidating or bullying content.
inappropriate, threatening, offensive, or harmful.
- Trolling, concern trolling, insulting or derogatory comments, and personal or political attacks.
- Public or private harassment.
- Publishing others' personal information, such as a photo, physical address, email address, online profile information, or other personal information, without their explicit permission or with the intent to bully or harass the other person.
- Posting deep fake or other AI generated content about or involving another person without the explicit permission.
- Spamming community channels and members, such as sending repeat messages, low-effort content, or automated messages.
- Phishing or any similar activity.
- Distributing or promoting malware.
- The use of any coded or suggestive content to hide or provoke otherwise unacceptable behavior.
- Other conduct which could reasonably be considered harmful, illegal, or inappropriate in a professional setting.
Community leaders have the right and responsibility to remove, edit,
Please also see the Tailscale Acceptable Use Policy, available at [tailscale.com/tailscale-aup](https://tailscale.com/tailscale-aup).
or reject comments, commits, code, wiki edits, issues, and other
contributions that are not aligned to this Code of Conduct, and will
communicate reasons for moderation decisions when appropriate.
## Scope
## Reporting Incidents
This Code of Conduct applies within all community spaces, and also
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to Tailscale directly via <info@tailscale.com>, or to the community leaders or moderators via DM or similar.
applies when an individual is officially representing the community in
All complaints will be reviewed and investigated promptly and fairly.
public spaces. Examples of representing our community include using an
We will respect the privacy and safety of the reporter of any issues.
official e-mail address, posting via an official social media account,
or acting as an appointed representative at an online or offline
event.
## Enforcement
Please note that this community is not moderated by staff 24/7, and we do not have, and do not undertake, any obligation to prescreen, monitor, edit, or remove any content or data, or to actively seek facts or circumstances indicating illegal activity.
While we strive to keep the community safe and welcoming, moderation may not be immediate at all hours.
If you encounter any issues, report them using the appropriate channels.
Instances of abusive, harassing, or otherwise unacceptable behavior
## Enforcement Guidelines
may be reported to the community leaders responsible for enforcement
at [info@tailscale.com](mailto:info@tailscale.com). All complaints
will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and
Community leaders and moderators are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
security of the reporter of any incident.
## Enforcement Guidelines
Community leaders and moderators have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Community Code of Conduct.
Tailscale retains full discretion to take action (or not) in response to a violation of these guidelines with or without notice or liability to you.
We will interpret our policies and resolve disputes in favor of protecting users, customers, the public, our community and our company, as a whole.
Community leaders will follow these Community Impact Guidelines in
Community leaders will follow these community enforcement guidelines in determining the consequences for any action they deem in violation of this Code of Conduct,
determining the consequences for any action they deem in violation of
and retain full discretion to apply the enforcement guidelines as necessary depending on the circumstances:
this Code of Conduct:
### 1. Correction
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior
Community Impact: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
deemed unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders,
Consequence: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate.
providing clarity around the nature of the violation and an
A public apology may be requested.
explanation of why the behavior was inappropriate. A public apology
may be requested.
### 2. Warning
### 2. Warning
**Community Impact**: A violation through a single incident or series
Community Impact: A violation through a single incident or series of actions.
of actions.
**Consequence**: A warning with consequences for continued
Consequence: A warning with consequences for continued behavior.
behavior. No interaction with the people involved, including
No interaction with the people involved, including unsolicited interaction with those enforcing this Community Code of Conduct, for a specified period of time.
unsolicited interaction with those enforcing the Code of Conduct, for
This includes avoiding interactions in community spaces as well as external channels like social media.
a specified period of time. This includes avoiding interactions in
Violating these terms may lead to a temporary or permanent ban.
community spaces as well as external channels like social
media. Violating these terms may lead to a temporary or permanent ban.
### 3. Temporary Ban
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards,
Community Impact: A serious violation of community standards, including sustained inappropriate behavior.
including sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or
Consequence: A temporary ban from any sort of interaction or public communication with the community for a specified period of time.
public communication with the community for a specified period of
No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
time. No public or private interaction with the people involved,
including unsolicited interaction with those enforcing the Code of
Conduct, is allowed during this period. Violating these terms may lead
to a permanent ban.
### 4. Permanent Ban
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of
Community Impact: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
community standards, including sustained inappropriate behavior,
harassment of an individual, or aggression toward or disparagement of
classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction
Consequence: A permanent ban from any sort of public interaction within the community.
within the community.
## Acceptable Use Policy
Violation of this Community Code of Conduct may also violate the Tailscale Acceptable Use Policy, which may result in suspension or termination of your Tailscale account.
For more information, please see the Tailscale Acceptable Use Policy, available at [tailscale.com/tailscale-aup](https://tailscale.com/tailscale-aup).
## Privacy
Please see the Tailscale [Privacy Policy](https://tailscale.com/privacy-policy) for more information about how Tailscale collects, uses, discloses and protects information.
## Attribution
## Attribution
This Code of Conduct is adapted from the [Contributor
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0, available at <https://www.contributor-covenant.org/version/2/0/code_of_conduct.html>.