Commit Graph

7406 Commits (bradfitz/login_retry)
 

Author SHA1 Message Date
Brad Fitzpatrick ae3d9cde4c ipn/ipnlocal: avoid StartLoginInteractive crash with hacky retry loop
This adds to the pile of terrible locking in LocalBackend/controlclient
but deflakes integration tests, so meh.

The real boss is #11649.

Fixes #7036

Change-Id: I46d382aa2d55c20db1d1c72ba4219a1e93fa9c64
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick 7ec0dc3834 ipn/ipnlocal: make StartLoginInteractive take (yet unused) context
In prep for future fix to undermentioned issue.

Updates tailscale/tailscale#7036

Change-Id: Ide114db917dcba43719482ffded6a9a54630d99e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Claire Wang 9171b217ba
cmd/tailscale, ipn/ipnlocal: add suggest exit node CLI option (#11407)
Updates tailscale/corp#17516

Signed-off-by: Claire Wang <claire@tailscale.com>
1 month ago
Charlotte Brandhorst-Satzkorn 449f46c207
wgengine/magicsock: rebind/restun if a syscall.EPERM error is returned (#11711)
We have seen in macOS client logs that the "operation not permitted", a
syscall.EPERM error, is being returned when traffic is attempted to be
sent. This may be caused by security software on the client.

This change will perform a rebind and restun if we receive a
syscall.EPERM error on clients running darwin. Rebinds will only be
called if we haven't performed one specifically for an EPERM error in
the past 5 seconds.

Updates #11710

Signed-off-by: Charlotte Brandhorst-Satzkorn <charlotte@tailscale.com>
1 month ago
Will Norris 14c8b674ea Revert "licenses: add gliderlabs/ssh license"
The gliderlabs/ssh license is actually already included in the standard
package listing.  I'm not sure why I thought it wasn't.

Updates tailscale/corp#5780

This reverts commit 11dca08e93.

Signed-off-by: Will Norris <will@tailscale.com>
1 month ago
Brad Fitzpatrick 952e06aa46 wgengine/router: don't attempt route cleanup on Synology
Trying to run iptables/nftables on Synology pauses for minutes with
lots of errors and ultimately does nothing as it's not used and we
lack permissions.

This fixes a regression from db760d0bac (#11601) that landed
between Synology testing on unstable 1.63.110 and 1.64.0 being cut.

Fixes #11737

Change-Id: Iaf9563363b8e45319a9b6fe94c8d5ffaecc9ccef
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Irbe Krumina 38fb23f120
cmd/k8s-operator,k8s-operator: allow users to configure proxy env vars via ProxyClass (#11743)
Adds new ProxyClass.spec.statefulSet.pod.{tailscaleContainer,tailscaleInitContainer}.Env field
that allow users to provide key, value pairs that will be set as env vars for the respective containers.
Allow overriding all containerboot env vars,
but warn that this is not supported and might break (in docs + a warning when validating ProxyClass).

Updates tailscale/tailscale#10709

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
1 month ago
Brad Fitzpatrick 9258bcc360 Makefile: fix default SYNO_ARCH in Makefile
It was broken with the move to dist in 32e0ba5e68 which doesn't accept
amd64 anymore.

Updates #cleanup

Change-Id: Iaaaba2d73c6a09a226934fe8e5c18b16731ee7a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick b9aa7421d6 ipn/ipnlocal: remove some dead code (legacyBackend methods) from LocalBackend
Nothing used it.

Updates #11649

Change-Id: Ic1c331d947974cd7d4738ff3aafe9c498853689e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick a6739c49df paths: set default state path on AIX
Updates #11361

Change-Id: I196727a540be6b7c75303f9958490b1d76189fd6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick 271cfdb3d3 util/syspolicy: clean up doc grammar and consistency
Updates #cleanup

Change-Id: I912574cbd5ef4d8b7417b8b2a9b9a2ccfef88840
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick bad3159b62 ipn/ipnlocal: delete useless SetControlClientGetterForTesting use
Updates #11649

Change-Id: I56c069b9c97bd3e30ff87ec6655ec57e1698427c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick 8186cd0349 ipn/ipnlocal: delete redundant TestStatusWithoutPeers
We have tstest/integration nowadays.

And this test was one of the lone holdouts using the to-be-nuked
SetControlClientGetterForTesting.

Updates #11649

Change-Id: Icf8a6a2e9b8ae1ac534754afa898c00dc0b7623b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick 68043a17c2 ipn/ipnlocal: centralize assignments to cc + ccAuto in new method
cc vs ccAuto is a mess. It needs to go. But this is a baby step towards
getting there.

Updates #11649

Change-Id: I34f33934844e580bd823a7d8f2b945cf26c87b3b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick 970b1e21d0 ipn/ipnlocal: inline assertClientLocked into its now sole caller
Updates #11649

Change-Id: I8e2a5e59125a0cad5c0a8c9ed8930585f1735d03
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick 170c618483 ipn/ipnlocal: remove dead code now that Android uses LocalAPI instead
The new Android app and its libtailscale don't use this anymore;
it uses LocalAPI like other clients now.

Updates #11649

Change-Id: Ic9f42b41e0e0280b82294329093dc6c275f41d50
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Flakes Updater 65f215115f go.mod.sri: update SRI hash for go.mod changes
Signed-off-by: Flakes Updater <noreply+flakes-updater@tailscale.com>
1 month ago
Brad Fitzpatrick a1abd12f35 cmd/tailscaled, net/tstun: build for aix/ppc64
At least in userspace-networking mode.

Fixes #11361

Change-Id: I78d33f0f7e05fe9e9ee95b97c99b593f8fe498f2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
kari-ts 1cd51f95c7
ipnlocal: enable allow LAN for android (#11709)
Updates tailscale/corp#18984
Updates tailscale/corp#18202
1 month ago
Claire Wang 976d3c7b5f
tailcfg: add exit destination for network flow logs node attribute (#11698)
Updates tailscale/corp#18625

Signed-off-by: Claire Wang <claire@tailscale.com>
1 month ago
Joe Tsai 7a77a2edf1
logtail: optimize JSON processing (#11671)
Changes made:

* Avoid "encoding/json" for JSON processing, and instead use
"github.com/go-json-experiment/json/jsontext".
Use jsontext.Value.IsValid for validation, which is much faster.
Use jsontext.AppendQuote instead of our own JSON escaping.

* In drainPending, use a different maxLen depending on lowMem.
In lowMem mode, it is better to perform multiple uploads
than it is to construct a large body that OOMs the process.

* In drainPending, if an error is encountered draining,
construct an error message in the logtail JSON format
rather than something that is invalid JSON.

* In appendTextOrJSONLocked, use jsontext.Decoder to check
whether the input is a valid JSON object. This is faster than
the previous approach of unmarshaling into map[string]any and
then re-marshaling that data structure.
This is especially beneficial for network flow logging,
which produces relatively large JSON objects.

* In appendTextOrJSONLocked, enforce maxSize on the input.
If too large, then we may end up in a situation where the logs
can never be uploaded because it exceeds the maximum body size
that the Tailscale logs service accepts.

* Use "tailscale.com/util/truncate" to properly truncate a string
on valid UTF-8 boundaries.

* In general, remove unnecessary spaces in JSON output.

Performance:

    name       old time/op    new time/op    delta
    WriteText     776ns ± 2%     596ns ± 1%   -23.24%  (p=0.000 n=10+10)
    WriteJSON     110µs ± 0%       9µs ± 0%   -91.77%  (p=0.000 n=8+8)

    name       old alloc/op   new alloc/op   delta
    WriteText      448B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
    WriteJSON    37.9kB ± 0%     0.0kB ± 0%   -99.87%  (p=0.000 n=10+10)

    name       old allocs/op  new allocs/op  delta
    WriteText      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
    WriteJSON     1.08k ± 0%     0.00k ± 0%   -99.91%  (p=0.000 n=10+10)

For text payloads, this is 1.30x faster.
For JSON payloads, this is 12.2x faster.

Updates #cleanup
Updates tailscale/corp#18514

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
1 month ago
Aaron Klotz 4d5d669cd5 net/dns: unconditionally write NRPT rules to local settings
We were being too aggressive when deciding whether to write our NRPT rules
to the local registry key or the group policy registry key.

After once again reviewing the document which calls itself a spec
(see issue), it is clear that the presence of the DnsPolicyConfig subkey
is the important part, not the presence of values set in the DNSClient
subkey. Furthermore, a footnote indicates that the presence of
DnsPolicyConfig in the GPO key will always override its counterpart in
the local key. The implication of this is important: we may unconditionally
write our NRPT rules to the local key. We copy our rules to the policy
key only when it contains NRPT rules belonging to somebody other than us.

Fixes https://github.com/tailscale/corp/issues/19071

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
1 month ago
License Updater 9d021579e7 licenses: update license notices
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
1 month ago
Will Norris 11dca08e93 licenses: add gliderlabs/ssh license
This package is included in the tempfork directory, rather than as a go
module dependency, so is not included in the normal package list.

Updates tailscale/corp#5780

Signed-off-by: Will Norris <will@tailscale.com>
1 month ago
Jenny Zhang 2207643312 VERSION.txt: this is v1.65.0
Signed-off-by: Jenny Zhang <jz@tailscale.com>
1 month ago
Jenny Zhang 09524b58f3 VERSION.txt: this is v1.64.0
Signed-off-by: Jenny Zhang <jz@tailscale.com>
1 month ago
James Tucker a2eb1c22b0 wgengine/magicsock: allow disco communication without known endpoints
Just because we don't have known endpoints for a peer does not mean that
the peer should become unreachable. If we know the peers key, it should
be able to call us, then we can talk back via whatever path it called us
on. First step - don't drop the packet in this context.

Updates tailscale/corp#19106

Signed-off-by: James Tucker <james@tailscale.com>
1 month ago
Patrick O'Doherty 7f4cda23ac
scripts/installer.sh: add rpm GPG key import (#11686)
Extend the `zypper` install to import importing the GPG key used to sign
the repository packages.

Updates #11635

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
1 month ago
James Tucker 8fa3026614 tsweb: switch to fastuuid for request ID generation
Request ID generation appears prominently in some services cumulative
allocation rate, and while this does not eradicate this issue (the API
still makes UUID objects), it does improve the overhead of this API and
reduce the amount of garbage that it produces.

Updates tailscale/corp#18266
Updates tailscale/corp#19054

Signed-off-by: James Tucker <james@tailscale.com>
1 month ago
James Tucker d0f3fa7d7e util/fastuuid: add a more efficient uuid generator
This still generates github.com/google/uuid UUID objects, but does so
using a ChaCha8 CSPRNG from the stdlib rand/v2 package. The public API
is backed by a sync.Pool to provide good performance in highly
concurrent operation.

Under high load the read API produces a lot of extra garbage and
overhead by way of temporaries and syscalls. This implementation reduces
both to minimal levels, and avoids any long held global lock by
utilizing sync.Pool.

Updates tailscale/corp#18266
Updates tailscale/corp#19054

Signed-off-by: James Tucker <james@tailscale.com>
1 month ago
James Tucker db760d0bac cmd/tailscaled: move cleanup to an implicit action during startup
This removes a potentially increased boot delay for certain boot
topologies where they block on ExecStartPre that may have socket
activation dependencies on other system services (such as
systemd-resolved and NetworkManager).

Also rename cleanup to clean up in affected/immediately nearby places
per code review commentary.

Fixes #11599

Signed-off-by: James Tucker <james@tailscale.com>
1 month ago
Nick Khyl 8d83adde07 util/winutil/winenv: add package for current Windows environment details
Package winenv provides information about the current Windows environment.
This includes details such as whether the device is a server or workstation,
and if it is AD domain-joined, MDM-registered, or neither.

Updates tailscale/corp#18342

Signed-off-by: Nick Khyl <nickk@tailscale.com>
1 month ago
Paul Scott da4e92bf01 cmd/tailscale/cli: prefix all --help usages with "tailscale ...", some tidying
Also capitalises the start of all ShortHelp, allows subcommands to be hidden
with a "HIDDEN: " prefix in their ShortHelp, and adds a TS_DUMP_HELP envknob
to look at all --help messages together.

Fixes #11664

Signed-off-by: Paul Scott <paul@tailscale.com>
1 month ago
Percy Wegmann 9da135dd64 cmd/tailscale/cli: moved share.go to drive.go
Updates tailscale/corp#16827

Signed-off-by: Percy Wegmann <percy@tailscale.com>
1 month ago
Percy Wegmann 1e0ebc6c6d cmd/tailscale/cli: rename share command to drive
Updates tailscale/corp#16827

Signed-off-by: Percy Wegmann <percy@tailscale.com>
1 month ago
Joe Tsai b4ba492701
logtail: require Buffer.Write to not retain the provided slice (#11617)
Buffer.Write has the exact same signature of io.Writer.Write.
The latter requires that implementations to never retain
the provided input buffer, which is an expectation that most
users will have when they see a Write signature.

The current behavior of Buffer.Write where it does retain
the input buffer is a risky precedent to set.
Switch the behavior to match io.Writer.Write.

There are only two implementations of Buffer in existence:
* logtail.memBuffer
* filch.Filch

The former can be fixed by cloning the input to Write.
This will cause an extra allocation in every Write,
but we can fix that will pooling on the caller side
in a follow-up PR.

The latter only passes the input to os.File.Write,
which does respect the io.Writer.Write requirements.

Updates #cleanup
Updates tailscale/corp#18514

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
1 month ago
Irbe Krumina 231e44e742
Revert "cmd/{k8s-nameserver,k8s-operator},k8s-operator: add a kube nameserver, make operator deploy it (#11017)" (#11669)
Temporarily reverting this PR to avoid releasing
half finished featue.

This reverts commit 9e2f58f846.

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
1 month ago
Andrea Gottardo 0001237253 docs/policy: update ADMX and ADML files with new Windows 1.62 syspolicies
Updates ENG-2776

Updates the .admx and .adml files to include the new ManagedByOrganizationName, ManagedByCaption and ManagedByURL system policies, added in Tailscale v1.62 for Windows.

Co-authored-by: Andrea Gottardo <andrea@gottardo.me>
Signed-off-by: Nick Khyl <nickk@tailscale.com>
1 month ago
Brad Fitzpatrick b27238b654 derp/derphttp: don't block in LocalAddr method
The derphttp.Client mutex is held during connects (for up to 10
seconds) so this LocalAddr method (blocking on said mutex) could also
block for up to 10 seconds, causing a pileup upstream in
magicsock/wgengine and ultimately a watchdog timeout resulting in a
crash.

Updates #11519

Change-Id: Idd1d94ee00966be1b901f6899d8b9492f18add0f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick e6983baa73
cmd/tailscale/cli: fix macOS crash reading envknob in init (#11667)
And add a test.

Regression from a5e1f7d703

Fixes tailscale/corp#19036

Change-Id: If90984049af0a4820c96e1f77ddf2fce8cb3043f

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Chloé Vulquin 0f3a292ebd
cli/configure: respect $KUBECONFIG (#11604)
cmd/tailscale/cli: respect $KUBECONFIG

* `$KUBECONFIG` is a `$PATH`-like: it defines a *list*.
`tailscale config kubeconfig` works like the rest of the
ecosystem so that if $KUBECONFIG is set it will write to the first existant file in the list, if none exist then
the final entry in the list.
* if `$KUBECONFIG` is an empty string, the old logic takes over.

Notes:

* The logic for file detection is inlined based on what `kind` does.
Technically it's a race condition, since the file could be removed/added
in between the processing steps, but the fallout shouldn't be too bad.
https://github.com/kubernetes-sigs/kind/blob/v0.23.0-alpha/pkg/cluster/internal/kubeconfig/internal/kubeconfig/paths.go

* The sandboxed (App Store) variant relies on a specific temporary
entitlement to access the ~/.kube/config file.
The entitlement is only granted to specific files, and so is not
applicable to paths supplied by the user at runtime.
While there may be other ways to achieve this access to arbitrary
kubeconfig files, it's out of scope for now.

Updates #11645

Signed-off-by: Chloé Vulquin <code@toast.bunkerlabs.net>
1 month ago
Brad Fitzpatrick c71e8db058 cmd/tailscale/cli: stop spamming os.Stdout/os.Stderr in tests
After:

    bradfitz@book1pro tailscale.com % ./tool/go test -c ./cmd/tailscale/cli
    bradfitz@book1pro tailscale.com % ./cli.test
    bradfitz@book1pro tailscale.com %

Before:

    bradfitz@book1pro tailscale.com % ./tool/go test -c ./cmd/tailscale/cli
    bradfitz@book1pro tailscale.com % ./cli.test

    Warning: funnel=on for foo.test.ts.net:443, but no serve config
             run: `tailscale serve --help` to see how to configure handlers

    Warning: funnel=on for foo.test.ts.net:443, but no serve config
             run: `tailscale serve --help` to see how to configure handlers
    USAGE
      funnel <serve-port> {on|off}
      funnel status [--json]

    Funnel allows you to publish a 'tailscale serve'
    server publicly, open to the entire internet.

    Turning off Funnel only turns off serving to the internet.
    It does not affect serving to your tailnet.

    SUBCOMMANDS
      status  show current serve/funnel status
    error: path must be absolute

    error: invalid TCP source "localhost:5432": missing port in address

    error: invalid TCP source "tcp://somehost:5432"
    must be one of: localhost or 127.0.0.1

    tcp://somehost:5432error: invalid TCP source "tcp://somehost:0"
    must be one of: localhost or 127.0.0.1

    tcp://somehost:0error: invalid TCP source "tcp://somehost:65536"
    must be one of: localhost or 127.0.0.1

    tcp://somehost:65536error: path must be absolute

    error: cannot serve web; already serving TCP

    You don't have permission to enable this feature.

This also moves the color handling up to a generic spot so it's
not just one subcommand doing it itself. See
https://github.com/tailscale/tailscale/issues/11626#issuecomment-2041795129

Fixes #11643
Updates #11626

Change-Id: I3a49e659dcbce491f4a2cb784be20bab53f72303
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Anton Tolchanov 5336362e64 prober: export probe class and metrics from bandwidth prober
- Wrap each prober function into a probe class that allows associating
  metric labels and custom metrics with a given probe;
- Make sure all existing probe classes set a `class` metric label;
- Move bandwidth probe size from being a metric label to a separate
  gauge metric; this will make it possible to use it to calculate
  average used bandwidth using a PromQL query;
- Also export transfer time for the bandwidth prober (more accurate than
  the total probe time, since it excludes connection establishment
  time).

Updates tailscale/corp#17912

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
1 month ago
Anton Tolchanov 21671ca374 prober: remove unused notification code
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
1 month ago
Brad Fitzpatrick b0fbd85592 net/tsdial: partially fix "tailscale nc" (UserDial) on macOS
At least in the case of dialing a Tailscale IP.

Updates #4529

Change-Id: I9fd667d088a14aec4a56e23aabc2b1ffddafa3fe
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick a5e1f7d703 ipn/{ipnlocal,localapi}: add API to toggle use of exit node
This is primarily for GUIs, so they don't need to remember the most
recently used exit node themselves.

This adds some CLI commands, but they're disabled and behind the WIP
envknob, as we need to consider naming (on/off is ambiguous with
running an exit node, etc) as well as automatic exit node selection in
the future. For now the CLI commands are effectively developer debug
things to test the LocalAPI.

Updates tailscale/corp#18724

Change-Id: I9a32b00e3ffbf5b29bfdcad996a4296b5e37be7e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Maisem Ali 3f4c5daa15 wgengine/netstack: remove SubnetRouterWrapper
It was used when we only supported subnet routers on linux
and would nil out the SubnetRoutes slice as no other router
worked with it, but now we support subnet routers on ~all platforms.

The field it was setting to nil is now only used for network logging
and nowhere else, so keep the field but drop the SubnetRouterWrapper
as it's not useful.

Updates #cleanup

Change-Id: Id03f9b6ec33e47ad643e7b66e07911945f25db79
Signed-off-by: Maisem Ali <maisem@tailscale.com>
1 month ago
alexelisenko fe22032fb3
net/dns/{publicdns,resolver}: add start of Control D support
Updates #7946

[@bradfitz fixed up version of #8417]

Change-Id: I1dbf6fa8d525b25c0d7ad5c559a7f937c3cd142a
Signed-off-by: alexelisenko <39712468+alexelisenko@users.noreply.github.com>
Signed-off-by: Alex Paguis <alex@windscribe.com>
1 month ago
Brad Fitzpatrick aa084a29c6 ipn/ipnlocal: name the unlockOnce type, plumb more, add Unlock method
This names the func() that Once-unlocked LocalBackend.mu. It does so
both for docs and because it can then have a method: Unlock, for the
few points that need to explicitly unlock early (the cause of all this
mess). This makes those ugly points easy to find, and also can then
make them stricter, panicking if the mutex is already unlocked. So a
normal call to the func just once-releases the mutex, returning false
if it's already done, but the Unlock method is the strict one.

Then this uses it more, so most the b.mu.Unlock calls remaining are
simple cases and usually defers.

Updates #11649

Change-Id: Ia070db66c54a55e59d2f76fdc26316abf0dd4627
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago
Brad Fitzpatrick 5e7c0b025c ipn/ipnlocal: add some "lockedOnEntry" helpers + guardrails, fix bug
A number of methods in LocalBackend (with suffixed "LockedOnEntry")
require b.mu be held but unlock it on the way out. That's asymmetric
and atypical and error prone.

This adds a helper method to LocalBackend that locks the mutex and
returns a sync.OnceFunc that unlocks the mutex. Then we pass around
that unlocker func down the chain to make it explicit (and somewhat
type check the passing of ownership) but also let the caller defer
unlock it, in the case of errors/panics that happen before the callee
gets around to calling the unlock.

This revealed a latent bug in LocalBackend.DeleteProfile which double
unlocked the mutex.

Updates #11649

Change-Id: I002f77567973bd77b8906bfa4ec9a2049b89836a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 month ago