tailscale

Commit Graph

Author	SHA1	Message	Date
Andrea Gottardo	3f3edeec07	health: drop unnecessary logging in TestSetUnhealthyWithTimeToVisible (#12795 ) Fixes tailscale/tailscale#12794 We were printing some leftover debug logs within a callback function that would be executed after the test completion, causing the test to fail. This change drops the log calls to address the issue. Signed-off-by: Andrea Gottardo <andrea@gottardo.me>	5 months ago
Andrea Gottardo	b7c3cfe049	health: support delayed Warnable visibility (#12783 ) Updates tailscale/tailscale#4136 To reduce the likelihood of presenting spurious warnings, add the ability to delay the visibility of certain Warnables, based on a TimeToVisible time.Duration field on each Warnable. The default is zero, meaning that a Warnable is immediately visible to the user when it enters an unhealthy state. Signed-off-by: Andrea Gottardo <andrea@gottardo.me>	5 months ago
Andrea Gottardo	309afa53cf	health: send ImpactsConnectivity value over LocalAPI (#12700 ) Updates tailscale/tailscale#4136 We should make sure to send the value of ImpactsConnectivity over to the clients using LocalAPI as they need it to display alerts in the GUI properly. Signed-off-by: Andrea Gottardo <andrea@gottardo.me>	5 months ago
Andrea Gottardo	732af2f6e0	health: reduce severity of some warnings, improve update messages (#12689 ) Updates tailscale/tailscale#4136 High severity health warning = a system notification will appear, which can be quite disruptive to the user and cause unnecessary concern in the event of a temporary network issue. Per design decision (@sonovawolf), the severity of all warnings but "network is down" should be tuned down to medium/low. ImpactsConnectivity should be set, to change the icon to an exclamation mark in some cases, but without a notification bubble. I also tweaked the messaging for update-available, to reflect how each platform gets updates in different ways. Signed-off-by: Andrea Gottardo <andrea@gottardo.me>	5 months ago
Andrew Lytvynov	2064dc20d4	health,ipn/ipnlocal: hide update warning when auto-updates are enabled (#12631 ) When auto-udpates are enabled, we don't need to nag users to update after a new release, before we release auto-updates. Updates https://github.com/tailscale/corp/issues/20081 Signed-off-by: Andrew Lytvynov <awly@tailscale.com>	5 months ago
Andrea Gottardo	6e55d8f6a1	health: add warming-up warnable (#12553 )	5 months ago
Andrea Gottardo	d7619d273b	health: fix nil DERPMap dereference panic Looks like a DERPmap might not be available when we try to get the name associated with a region ID, and that was causing an intermittent panic in CI. Fixes #12534 Change-Id: I4ace53681bf004df46c728cff830b27339254243 Signed-off-by: Andrea Gottardo <andrea@gottardo.me>	5 months ago
Andrea Gottardo	d6a8fb20e7	health: include DERP region name in bad derp notifications (#12530 ) Fixes tailscale/corp#20971 We added some Warnables for DERP failure situations, but their Text currently spits out the DERP region ID ("10") in the UI, which is super ugly. It would be better to provide the RegionName of the DERP region that is failing. We can do so by storing a reference to the last-known DERP map in the health package whenever we fetch one, and using it when generating the notification text. This way, the following message... > Tailscale could not connect to the relay server '10'. The server might be temporarily unavailable, or your Internet connection might be down. becomes: > Tailscale could not connect to the 'Seattle' relay server. The server might be temporarily unavailable, or your Internet connection might be down. which is a lot more user-friendly. Signed-off-by: Andrea Gottardo <andrea@gottardo.me>	5 months ago
Andrea Gottardo	d55b105dae	health: expose DependsOn to local API via UnhealthyState (#12513 ) Updates #4136 Small PR to expose the health Warnables dependencies to the GUI via LocalAPI, so that we can only show warnings for root cause issues, and filter out unnecessary messages before user presentation. Signed-off-by: Andrea Gottardo <andrea@gottardo.me>	5 months ago
Brad Fitzpatrick	7bc9d453c2	health: fix data race in new warnable code Fixes #12479 Change-Id: Ice84d5eb12d835eeddf6fc8cc337ea6b4dddcf6c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	5 months ago
Andrea Gottardo	a8ee83e2c5	health: begin work to use structured health warnings instead of strings, pipe changes into ipn.Notify (#12406 ) Updates tailscale/tailscale#4136 This PR is the first round of work to move from encoding health warnings as strings and use structured data instead. The current health package revolves around the idea of Subsystems. Each subsystem can have (or not have) a Go error associated with it. The overall health of the backend is given by the concatenation of all these errors. This PR polishes the concept of Warnable introduced by @bradfitz a few weeks ago. Each Warnable is a component of the backend (for instance, things like 'dns' or 'magicsock' are Warnables). Each Warnable has a unique identifying code. A Warnable is an entity we can warn the user about, by setting (or unsetting) a WarningState for it. Warnables have: - an identifying Code, so that the GUI can track them as their WarningStates come and go - a Title, which the GUIs can use to tell the user what component of the backend is broken - a Text, which is a function that is called with a set of Args to generate a more detailed error message to explain the unhappy state Additionally, this PR also begins to send Warnables and their WarningStates through LocalAPI to the clients, using ipn.Notify messages. An ipn.Notify is only issued when a warning is added or removed from the Tracker. In a next PR, we'll get rid of subsystems entirely, and we'll start using structured warnings for all errors affecting the backend functionality. Signed-off-by: Andrea Gottardo <andrea@gottardo.me>	5 months ago
Brad Fitzpatrick	96712e10a7	health, ipn/ipnlocal: move more health warning code into health.Tracker In prep for making health warnings rich objects with metadata rather than a bunch of strings, start moving it all into the same place. We'll still ultimately need the stringified form for the CLI and LocalAPI for compatibility but we'll next convert all these warnings into Warnables that have severity levels and such, and legacy stringification will just be something each Warnable thing can do. Updates #4136 Change-Id: I83e189435daae3664135ed53c98627c66e9e53da Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	7 months ago
Brad Fitzpatrick	7f587d0321	health, wgengine/magicsock: remove last of health package globals Fixes #11874 Updates #4136 Change-Id: Ib70e6831d4c19c32509fe3d7eee4aa0e9f233564 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	7 months ago
Brad Fitzpatrick	745931415c	health, all: remove health.Global, finish plumbing health.Tracker Updates #11874 Updates #4136 Change-Id: I414470f71d90be9889d44c3afd53956d9f26cd61 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	7 months ago
Brad Fitzpatrick	6d69fc137f	ipn/{ipnlocal,localapi},wgengine{,/magicsock}: plumb health.Tracker Down to 25 health.Global users. After this remains controlclient & net/dns & wgengine/router. Updates #11874 Updates #4136 Change-Id: I6dd1856e3d9bf523bdd44b60fb3b8f7501d5dc0d Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	7 months ago
Brad Fitzpatrick	723c775dbb	tsd, ipnlocal, etc: add tsd.System.HealthTracker, start some plumbing This adds a health.Tracker to tsd.System, accessible via a new tsd.System.HealthTracker method. In the future, that new method will return a tsd.System-specific HealthTracker, so multiple tsnet.Servers in the same process are isolated. For now, though, it just always returns the temporary health.Global value. That permits incremental plumbing over a number of changes. When the second to last health.Global reference is gone, then the tsd.System.HealthTracker implementation can return a private Tracker. The primary plumbing this does is adding it to LocalBackend and its dozen and change health calls. A few misc other callers are also plumbed. Subsequent changes will flesh out other parts of the tree (magicsock, controlclient, etc). Updates #11874 Updates #4136 Change-Id: Id51e73cfc8a39110425b6dc19d18b3975eac75ce Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	7 months ago
Brad Fitzpatrick	cb66952a0d	health: permit Tracker method calls on nil receiver In prep for tsd.System Tracker plumbing throughout tailscaled, defensively permit all methods on Tracker to accept a nil receiver without crashing, lest I screw something up later. (A health tracking system that itself causes crashes would be no good.) Methods on nil receivers should not be called, so a future change will also collect their stacks (and panic during dev/test), but we should at least not crash in prod. This also locks that in with a test using reflect to automatically call all methods on a nil receiver and check they don't crash. Updates #11874 Updates #4136 Change-Id: I8e955046ebf370ec8af0c1fb63e5123e6282a9d3 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	7 months ago
Brad Fitzpatrick	5b32264033	health: break Warnable into a global and per-Tracker value halves Previously it was both metadata about the class of warnable item as well as the value. Now it's only metadata and the value is per-Tracker. Updates #11874 Updates #4136 Change-Id: Ia1ed1b6c95d34bc5aae36cffdb04279e6ba77015 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	7 months ago
Brad Fitzpatrick	ebc552d2e0	health: add Tracker type, in prep for removing global variables This moves most of the health package global variables to a new `health.Tracker` type. But then rather than plumbing the Tracker in tsd.System everywhere, this only goes halfway and makes one new global Tracker (`health.Global`) that all the existing callers now use. A future change will eliminate that global. Updates #11874 Updates #4136 Change-Id: I6ee27e0b2e35f68cb38fecdb3b2dc4c3f2e09d68 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	7 months ago
Brad Fitzpatrick	7c1d6e35a5	all: use Go 1.22 range-over-int Updates #11058 Change-Id: I35e7ef9b90e83cac04ca93fd964ad00ed5b48430 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	7 months ago
Anton Tolchanov	8cc5c51888	health: warn about reverse path filtering and exit nodes When reverse path filtering is in strict mode on Linux, using an exit node blocks all network connectivity. This change adds a warning about this to `tailscale status` and the logs. Example in `tailscale status`: ``` - not connected to home DERP region 22 - The following issues on your machine will likely make usage of exit nodes impossible: [interface "eth0" has strict reverse-path filtering enabled], please set rp_filter=2 instead of rp_filter=1; see https://github.com/tailscale/tailscale/issues/3310 ``` Example in the logs: ``` 2024/02/21 21:17:07 health("overall"): error: multiple errors: not in map poll The following issues on your machine will likely make usage of exit nodes impossible: [interface "eth0" has strict reverse-path filtering enabled], please set rp_filter=2 instead of rp_filter=1; see https://github.com/tailscale/tailscale/issues/3310 ``` Updates #3310 Signed-off-by: Anton Tolchanov <anton@tailscale.com>	9 months ago
Andrew Dunham	727acf96a6	net/netcheck: use DERP frames as a signal for home region liveness This uses the fact that we've received a frame from a given DERP region within a certain time as a signal that the region is stil present (and thus can still be a node's PreferredDERP / home region) even if we don't get a STUN response from that region during a netcheck. This should help avoid DERP flaps that occur due to losing STUN probes while still having a valid and active TCP connection to the DERP server. RELNOTE=Reduce home DERP flapping when there's still an active connection Updates #8603 Signed-off-by: Andrew Dunham <andrew@du.nham.ca> Change-Id: If7da6312581e1d434d5c0811697319c621e187a0	12 months ago
Brad Fitzpatrick	4d196c12d9	health: don't report a warning in DERP homeless mode Updates #3363 Updates tailscale/corp#396 Change-Id: Ibfb0496821cb58a78399feb88d4206d81e95ca0f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	1 year ago
Brad Fitzpatrick	dc7aa98b76	all: use set.Set consistently instead of map[T]struct{} I didn't clean up the more idiomatic map[T]bool with true values, at least yet. I just converted the relatively awkward struct{}-valued maps. Updates #cleanup Change-Id: I758abebd2bb1f64bc7a9d0f25c32298f4679c14f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	1 year ago
Brad Fitzpatrick	14320290c3	control/controlclient: merge, simplify two health check calls I'm trying to remove some stuff from the netmap update path. Updates #1909 Change-Id: Iad2c728dda160cd52f33ef9cf0b75b4940e0ce64 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	1 year ago
Tom DNetto	abcb7ec1ce	cmd/tailscale: warn if node is locked out on bringup Updates https://github.com/tailscale/corp/issues/12718 Signed-off-by: Tom DNetto <tom@tailscale.com>	1 year ago
Andrew Dunham	2755f3843c	health, net/tlsdial: add healthcheck for self-signed cert When we make a connection to a server, we previously would verify with the system roots, and then fall back to verifying with our baked-in Let's Encrypt root if the system root cert verification failed. We now explicitly check for, and log a health error on, self-signed certificates. Additionally, we now always verify against our baked-in Let's Encrypt root certificate and log an error if that isn't successful. We don't consider this a health failure, since if we ever change our server certificate issuer in the future older non-updated versions of Tailscale will no longer be healthy despite being able to connect. Updates #3198 Change-Id: I00be5ceb8afee544ee795e3c7a2815476abc4abf Signed-off-by: Andrew Dunham <andrew@du.nham.ca>	2 years ago
Will Norris	71029cea2d	all: update copyright and license headers This updates all source files to use a new standard header for copyright and license declaration. Notably, copyright no longer includes a date, and we now use the standard SPDX-License-Identifier header. This commit was done almost entirely mechanically with perl, and then some minimal manual fixes. Updates #6865 Signed-off-by: Will Norris <will@tailscale.com>	2 years ago
Tom DNetto	0088c5ddc0	health,ipn/ipnlocal: report the node being locked out as a health issue Signed-off-by: Tom DNetto <tom@tailscale.com>	2 years ago
Aaron Klotz	659e7837c6	health, ipn/ipnlocal: when -no-logs-no-support is enabled, deny access to tailnets that have network logging enabled We want users to have the freedom to start tailscaled with `-no-logs-no-support`, but that is obviously in direct conflict with tailnets that have network logging enabled. When we detect that condition, we record the issue in health, notify the client, set WantRunning=false, and bail. We clear the item in health when a profile switch occurs, since it is a per-tailnet condition that should not propagate across profiles. Signed-off-by: Aaron Klotz <aaron@tailscale.com>	2 years ago
Brad Fitzpatrick	ea25ef8236	util/set: add new set package for SetHandle type We use this pattern in a number of places (in this repo and elsewhere) and I was about to add a fourth to this repo which was crossing the line. Add this type instead so they're all the same. Also, we have another Set type (SliceSet, which tracks its keys in order) in another repo we can move to this package later. Change-Id: Ibbdcdba5443fae9b6956f63990bdb9e9443cefa9 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2 years ago
Brad Fitzpatrick	3f8e185003	health: add Warnable, move ownership of warnable items to callers The health package was turning into a rando dumping ground. Make a new Warnable type instead that callers can request an instance of, and then Set it locally in their code without the health package being aware of all the things that are warnable. (For plenty of things the health package will want to know details of how Tailscale works so it can better prioritize/suppress errors, but lots of the warnings are pretty leaf-y and unrelated) This just moves two of the health warnings. Can probably move more later. Change-Id: I51e50e46eb633f4e96ced503d3b18a1891de1452 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2 years ago
Brad Fitzpatrick	08e110ebc5	cmd/tailscale: make "up", "status" warn if routes and --accept-routes off Example output: # Health check: # - Some peers are advertising routes but --accept-routes is false Also, move "tailscale status" health checks to the bottom, where they won't be lost in large netmaps. Updates #2053 Updates #6266 Change-Id: I5ae76a0cd69a452ce70063875cd7d974bfeb8f1a Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2 years ago
Brad Fitzpatrick	e55ae53169	tailcfg: add Node.UnsignedPeerAPIOnly to let server mark node as peerapi-only capver 48 Change-Id: I20b2fa81d61ef8cc8a84e5f2afeefb68832bd904 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2 years ago
Brad Fitzpatrick	3562b5bdfa	envknob, health: support Synology, show parse errors in status Updates #5114 Change-Id: I8ac7a22a511f5a7d0dcb8cac470d4a403aa8c817 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2 years ago
Brad Fitzpatrick	74674b110d	envknob: support changing envknobs post-init Updates #5114 Change-Id: Ia423fc7486e1b3f3180a26308278be0086fae49b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2 years ago
Jordan Whited	43f9c25fd2	cmd/tailscale: surface authentication errors in status.Health (#4748 ) Fixes #3713 Signed-off-by: Jordan Whited <jordan@tailscale.com>	3 years ago
Brad Fitzpatrick	2ff481ff10	net/dns: add health check for particular broken-ish Linux DNS config Updates #3937 (need to write docs before closing) Change-Id: I1df7244cfbb0303481e2621ee750d21358bd67c6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	3 years ago
Brad Fitzpatrick	41fd4eab5c	envknob: add new package for all the strconv.ParseBool(os.Getenv(..)) A new package can also later record/report which knobs are checked and set. It also makes the code cleaner & easier to grep for env knobs. Change-Id: Id8a123ab7539f1fadbd27e0cbeac79c2e4f09751 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	3 years ago
Brad Fitzpatrick	0aa4c6f147	net/dns/resolver: add debug HTML handler to see what DNS traffic was forwarded Change-Id: I6b790e92dcc608515ac8b178f2271adc9fd98f78 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	3 years ago
Brad Fitzpatrick	8f43ddf1a2	ipn/ipnlocal, health: populate self node's Online bit in tailscale status One option was to just hide "offline" in the text output, but that doesn't fix the JSON output. The next option was to lie and say it's online in the JSON (which then fixes the "offline" in the text output). But instead, this sets the self node's "Online" to whether we're in an active map poll. Fixes #3564 Change-Id: I9b379989bd14655198959e37eec39bb570fb814a Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	3 years ago
David Anderson	6c82cebe57	health: add a health state for net/dns.OSConfigurator. Lets the systemd-resolved OSConfigurator report health changes for out of band config resyncs. Updates #3327 Signed-off-by: David Anderson <danderson@tailscale.com>	3 years ago
Josh Bleecher Snyder	3fd5f4380f	util/multierr: new package github.com/go-multierror/multierror served us well. But we need a few feature from it (implement Is), and it's not worth maintaining a fork of such a small module. Instead, I did a clean room implementation inspired by its API. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	3 years ago
Brad Fitzpatrick	09e692e318	health: don't look for UDP goroutines in js/wasm health check Updates #3157 Change-Id: I43d97e6876eeb2d1936fc567835134568bb8615c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	3 years ago
Brad Fitzpatrick	aae622314e	tailcfg, health: add way for control plane to add problems to health check So if the control plane knows that something's broken about the node, it can include problem(s) in MapResponse and "tailscale status" will show it. (and GUIs in the future, as it's in ipnstate.Status/JSON) This also bumps the MapRequest.Version, though it's not strictly required. Doesn't hurt. Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	3 years ago
Brad Fitzpatrick	4549d3151c	cmd/tailscale: make status show health check problems Fixes #2775 RELNOTE=tailscale status now shows health check problems Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	3 years ago
Brad Fitzpatrick	5bacbf3744	wgengine/magicsock, health, ipn/ipnstate: track DERP-advertised health And add health check errors to ipnstate.Status (tailscale status --json). Updates #2746 Updates #2775 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	3 years ago
Josh Bleecher Snyder	9d542e08e2	wgengine/magicsock: always run ReceiveIPv6 One of the consequences of the bind refactoring in `6f23087175` is that attempting to bind an IPv6 socket will always result in c.pconn6.pconn being non-nil. If the bind fails, it'll be set to a placeholder packet conn that blocks forever. As a result, we can always run ReceiveIPv6 and health check it. This removes IPv4/IPv6 asymmetry and also will allow health checks to detect any IPv6 receive func failures. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>	4 years ago
Josh Bleecher Snyder	fe50ded95c	health: track whether we have a functional udp4 bind Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com> Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>	4 years ago
Josh Bleecher Snyder	744de615f1	health, wgenegine: fix receive func health checks for the fourth time The old implementation knew too much about how wireguard-go worked. As a result, it missed genuine problems that occurred due to unrelated bugs. This fourth attempt to fix the health checks takes a black box approach. A receive func is healthy if one (or both) of these conditions holds: * It is currently running and blocked. * It has been executed recently. The second condition is required because receive functions are not continuously executing. wireguard-go calls them and then processes their results before calling them again. There is a theoretical false positive if wireguard-go go takes longer than one minute to process the results of a receive func execution. If that happens, we have other problems. Updates #1790 Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>	4 years ago

1 2

60 Commits (f0b9d3f477bf4c03f4377d240dea21c383c2570a)