Commit Graph

79 Commits (8161024176ce3adaf410295f4b6eee738cba0471)

Author SHA1 Message Date
Jonathan Nobels 27033c6277
net/dns: recheck DNS config on SERVFAIL errors (#12547)
Fixes tailscale/corp#20677

Replaces the original attempt to rectify this (by injecting a netMon
event) which was both heavy handed, and missed cases where the
netMon event was "minor".

On apple platforms, the fetching the interface's nameservers can
and does return an empty list in certain situations.   Apple's API
in particular is very limiting here.  The header hints at notifications
for dns changes which would let us react ahead of time, but it's all
private APIs.

To avoid remaining in the state where we end up with no
nameservers but we absolutely need them, we'll react
to a lack of upstream nameservers by attempting to re-query
the OS.

We'll rate limit this to space out the attempts.   It seems relatively
harmless to attempt a reconfig every 5 seconds (triggered
by an incoming query) if the network is in this broken state.

Missing nameservers might possibly be a persistent condition
(vs a transient error), but that would  also imply that something
out of our control is badly misconfigured.

Tested by randomly returning [] for the nameservers.   When switching
between Wifi networks, or cell->wifi, this will randomly trigger
the bug, and we appear to reliably heal the DNS state.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
5 months ago
Aaron Klotz d7a4f9d31c net/dns: ensure multiple hosts with the same IP address are combined into a single HostEntry
This ensures that each line has a unique IP address.

Fixes #11939

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
5 months ago
Nick Khyl c32efd9118 various: create a catch-all NRPT rule when "Override local DNS" is enabled on Windows
Without this rule, Windows 8.1 and newer devices issue parallel DNS requests to DNS servers
associated with all network adapters, even when "Override local DNS" is enabled and/or
a Mullvad exit node is being used, resulting in DNS leaks.

This also adds "disable-local-dns-override-via-nrpt" nodeAttr that can be used to disable
the new behavior if needed.

Fixes tailscale/corp#20718

Signed-off-by: Nick Khyl <nickk@tailscale.com>
5 months ago
Brad Fitzpatrick 916c4db75b net/dns: fix crash in tests
Looks like #12346 as submitted with failing tests.

Updates #12346

Change-Id: I582cd0dfb117686330d935d763d972373c5ae598
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
5 months ago
Andrea Gottardo b65221999c
tailcfg,net/dns: add controlknob to disable battery split DNS on iOS (#12346)
Updates corp#15802.

Adds the ability for control to disable the recently added change that uses split DNS in more cases on iOS. This will allow us to disable the feature if it leads to regression in production. We plan to remove this knob once we've verified that the feature works properly.

Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
5 months ago
Andrea Gottardo d636407f14
net/dns: don't set MatchDomains on Apple platforms when no upstream nameservers available (#12334)
This PR addresses a DNS issue on macOS as discussed this morning.

Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
5 months ago
Andrea Gottardo dd77111462
xcode/iOS: set MatchDomains when no route requires a custom DNS resolver (#10576)
Updates https://github.com/tailscale/corp/issues/15802.

On iOS exclusively, this PR adds logic to use a split DNS configuration in more cases, with the goal of improving battery life. Acting as the global DNS resolver on iOS should be avoided, as it leads to frequent wakes of IPNExtension.

We try to determine if we can have Tailscale only handle DNS queries for resources inside the tailnet, that is, all routes in the DNS configuration do not require a custom resolver (this is the case for app connectors, for instance).

If so, we set all Routes as MatchDomains. This enables a split DNS configuration which will help preserve battery life. Effectively, for the average Tailscale user who only relies on MagicDNS to resolve *.ts.net domains, this means that Tailscale DNS will only be used for those domains.

This PR doesn't affect users with Override Local DNS enabled. For these users, there should be no difference and Tailscale will continue acting as a global DNS resolver.

Signed-off-by: Andrea Gottardo <andrea@tailscale.com>
6 months ago
Brad Fitzpatrick 3672f29a4e net/netns, net/dns/resolver, etc: make netmon required in most places
The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached. But first (this change and others)
we need to make sure the one netmon.Monitor is plumbed everywhere.

Some notable bits:

* tsdial.NewDialer is added, taking a now-required netmon

* because a tsdial.Dialer always has a netmon, anything taking both
  a Dialer and a NetMon is now redundant; take only the Dialer and
  get the NetMon from that if/when needed.

* netmon.NewStatic is added, primarily for tests

Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299

Change-Id: I877f9cb87618c4eb037cee098241d18da9c01691
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick 745931415c health, all: remove health.Global, finish plumbing health.Tracker
Updates #11874
Updates #4136

Change-Id: I414470f71d90be9889d44c3afd53956d9f26cd61
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Brad Fitzpatrick ebc552d2e0 health: add Tracker type, in prep for removing global variables
This moves most of the health package global variables to a new
`health.Tracker` type.

But then rather than plumbing the Tracker in tsd.System everywhere,
this only goes halfway and makes one new global Tracker
(`health.Global`) that all the existing callers now use.

A future change will eliminate that global.

Updates #11874
Updates #4136

Change-Id: I6ee27e0b2e35f68cb38fecdb3b2dc4c3f2e09d68
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
James Tucker db760d0bac cmd/tailscaled: move cleanup to an implicit action during startup
This removes a potentially increased boot delay for certain boot
topologies where they block on ExecStartPre that may have socket
activation dependencies on other system services (such as
systemd-resolved and NetworkManager).

Also rename cleanup to clean up in affected/immediately nearby places
per code review commentary.

Fixes #11599

Signed-off-by: James Tucker <james@tailscale.com>
7 months ago
Andrew Lytvynov 2716250ee8
all: cleanup unused code, part 2 (#10670)
And enable U1000 check in staticcheck.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
11 months ago
Ryan Petris c4855fe0ea Fix Empty Resolver Set
Config.singleResolverSet returns true if all routes have the same resolvers,
even if the routes have no resolvers. If none of the routes have a specific
resolver, the default should be used instead. Therefore, check for more than
0 instead of nil.

Signed-off-by: Ryan Petris <ryan@petris.net>
1 year ago
Andrew Dunham 530aaa52f1 net/dns: retry forwarder requests over TCP
We weren't correctly retrying truncated requests to an upstream DNS
server with TCP. Instead, we'd return a truncated request to the user,
even if the user was querying us over TCP and thus able to handle a
large response.

Also, add an envknob and controlknob to allow users/us to disable this
behaviour if it turns out to be buggy ( DNS ).

Updates #9264

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ifb04b563839a9614c0ba03e9c564e8924c1a2bfd
1 year ago
Brad Fitzpatrick 3b32d6c679 wgengine/magicsock, controlclient, net/dns: reduce some logspam
Updates #cleanup

Change-Id: I78b0697a01e94baa33f3de474b591e616fa5e6af
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Brad Fitzpatrick e8551d6b40 all: use Go 1.21 slices, maps instead of x/exp/{slices,maps}
Updates #8419

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
David Anderson 52212f4323 all: update exp/slices and fix call sites
slices.SortFunc suffered a late-in-cycle API breakage.

Updates #cleanup

Signed-off-by: David Anderson <danderson@tailscale.com>
1 year ago
Mihai Parparita 7330aa593e all: avoid repeated default interface lookups
On some platforms (notably macOS and iOS) we look up the default
interface to bind outgoing connections to. This is both duplicated
work and results in logspam when the default interface is not available
(i.e. when a phone has no connectivity, we log an error and thus cause
more things that we will try to upload and fail).

Fixed by passing around a netmon.Monitor to more places, so that we can
use its cached interface state.

Fixes #7850
Updates #7621

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
2 years ago
Mihai Parparita 4722f7e322 all: move network monitoring from wgengine/monitor to net/netmon
We're using it in more and more places, and it's not really specific to
our use of Wireguard (and does more just link/interface monitoring).

Also removes the separate interface we had for it in sockstats -- it's
a small enough package (we already pull in all of its dependencies
via other paths) that it's not worth the extra complexity.

Updates #7621
Updates #7850

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
2 years ago
Will Norris 71029cea2d all: update copyright and license headers
This updates all source files to use a new standard header for copyright
and license declaration.  Notably, copyright no longer includes a date,
and we now use the standard SPDX-License-Identifier header.

This commit was done almost entirely mechanically with perl, and then
some minimal manual fixes.

Updates #6865

Signed-off-by: Will Norris <will@tailscale.com>
2 years ago
Tom DNetto 673b3d8dbd net/dns,userspace: remove unused DNS paths, normalize query limit on iOS
With a42a594bb3, iOS uses netstack and
hence there are no longer any platforms which use the legacy MagicDNS path. As such, we remove it.

We also normalize the limit for max in-flight DNS queries on iOS (it was 64, now its 256 as per other platforms).
It was 64 for the sake of being cautious about memory, but now we have 50Mb (iOS-15 and greater) instead of 15Mb
so we have the spare headroom.

Signed-off-by: Tom DNetto <tom@tailscale.com>
2 years ago
Andrew Dunham 0372e14d79 net/dns: bump DNS-over-TCP size limit to 4k
We saw a few cases where we hit this limit; bumping to 4k seems
relatively uncontroversial.

Change-Id: I218fee3bc0d2fa5fde16eddc36497a73ebd7cbda
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
2 years ago
Andrew Dunham 3f4d51c588 net/dns: don't send on closed channel when message too large
Previously, if a DNS-over-TCP message was received while there were
existing queries in-flight, and it was over the size limit, we'd close
the 'responses' channel. This would cause those in-flight queries to
send on the closed channel and panic.

Instead, don't close the channel at all and rely on s.ctx being
canceled, which will ensure that in-flight queries don't hang.

Fixes #6725

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I8267728ac37ed7ae38ddd09ce2633a5824320097
2 years ago
Maisem Ali 3555a49518 net/dns: always attempt to read the OS config on macOS/iOS
Also reconfigure DNS on iOS/macOS on link changes.

Signed-off-by: Maisem Ali <maisem@tailscale.com>
2 years ago
Mihai Parparita 8343b243e7 all: consistently initialize Logf when creating tsdial.Dialers
Most visible when using tsnet.Server, but could have resulted in dropped
messages in a few other places too.

Fixes #5743

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
2 years ago
Josh Soref d4811f11a0 all: fix spelling mistakes
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2 years ago
Brad Fitzpatrick 58abae1f83 net/dns/{publicdns,resolver}: add NextDNS DoH support
NextDNS is unique in that users create accounts and then get
user-specific DNS IPs & DoH URLs.

For DoH, the customer ID is in the URL path.

For IPv6, the IP address includes the customer ID in the lower bits.

For IPv4, there's a fragile "IP linking" mechanism to associate your
public IPv4 with an assigned NextDNS IPv4 and that tuple maps to your
customer ID.

We don't use the IP linking mechanism.

Instead, NextDNS is DoH-only. Which means using NextDNS necessarily
shunts all DNS traffic through 100.100.100.100 (programming the OS to
use 100.100.100.100 as the global resolver) because operating systems
can't usually do DoH themselves.

Once it's in Tailscale's DoH client, we then connect out to the known
NextDNS IPv4/IPv6 anycast addresses.

If the control plane sends the client a NextDNS IPv6 address, we then
map it to the corresponding NextDNS DoH with the same client ID, and
we dial that DoH server using the combination of v4/v6 anycast IPs.

Updates #2452

Change-Id: I3439d798d21d5fc9df5a2701839910f5bef85463
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 9f6c8517e0 net/dns: set OS DNS to 100.100.100.100 for route-less ExtraRecords [cap 41]
If ExtraRecords (Hosts) are specified without a corresponding split
DNS route and global DNS is specified, then program the host OS DNS to
use 100.100.100.100 so it can blend in those ExtraRecords.

Updates #1543

Change-Id: If49014a5ecc8e38978ff26e54d1f74fe8dbbb9bc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Maisem Ali 9197dd14cc net/dns: [win] add MagicDNS entries to etc/hosts
This works around the 2.3s delay in short name lookups when SNR is
enabled.
C:\Windows\System32\drivers\etc\hosts file. We only add known hosts that
match the search domains, and we populate the list in order of
Search Domains so that our matching algorithm mimics what Windows would
otherwise do itself if SNR was off.

Updates #1659

Signed-off-by: Maisem Ali <maisem@tailscale.com>
2 years ago
Maisem Ali 1cff719015 net/dns: [win] respond with SERVFAIL queries when no resolvers
Currently we forward unmatched queries to the default resolver on
Windows. This results in duplicate queries being issued to the same
resolver which is just wasted.

Updates #1659

Signed-off-by: Maisem Ali <maisem@tailscale.com>
2 years ago
Brad Fitzpatrick a12aad6b47 all: convert more code to use net/netip directly
perl -i -npe 's,netaddr.IPPrefixFrom,netip.PrefixFrom,' $(git grep -l -F netaddr.)
    perl -i -npe 's,netaddr.IPPortFrom,netip.AddrPortFrom,' $(git grep -l -F netaddr. )
    perl -i -npe 's,netaddr.IPPrefix,netip.Prefix,g' $(git grep -l -F netaddr. )
    perl -i -npe 's,netaddr.IPPort,netip.AddrPort,g' $(git grep -l -F netaddr. )
    perl -i -npe 's,netaddr.IP\b,netip.Addr,g' $(git grep -l -F netaddr. )
    perl -i -npe 's,netaddr.IPv6Raw\b,netip.AddrFrom16,g' $(git grep -l -F netaddr. )
    goimports -w .

Then delete some stuff from the net/netaddr shim package which is no
longer neeed.

Updates #5162

Change-Id: Ia7a86893fe21c7e3ee1ec823e8aba288d4566cd8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 7eaf5e509f net/netaddr: start migrating to net/netip via new netaddr adapter package
Updates #5162

Change-Id: Id7bdec303b25471f69d542f8ce43805328d56c12
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Aaron Klotz 1cae618b03 net/dns: add Windows group policy notifications to the NRPT rule manager
As discussed in previous PRs, we can register for notifications when group
policies are updated and act accordingly.

This patch changes nrptRuleDatabase to receive notifications that group policy
has changed and automatically move our NRPT rules between the local and
group policy subkeys as needed.

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
2 years ago
Tom acfe5bd33b
net/dns{., resolver}: time out DNS queries after 10 seconds (#4690)
Fixes https://github.com/tailscale/corp/issues/5198

The upstream forwarder will block indefinitely on `udpconn.ReadFrom` if no
reply is recieved, due to the lack of deadline on the connection object.

There still isn't a deadline on the connection object, but the automatic closing
of the context on deadline expiry will close the connection via `closeOnCtxDone`,
unblocking the read and resulting in a normal teardown.

Signed-off-by: Tom DNetto <tom@tailscale.com>
3 years ago
Maisem Ali fd99c54e10 tailcfg,all: change structs to []*dnstype.Resolver
Signed-off-by: Maisem Ali <maisem@tailscale.com>
3 years ago
Tom d1d6ab068e
net/dns, wgengine: implement DNS over TCP (#4598)
* net/dns, wgengine: implement DNS over TCP

Signed-off-by: Tom DNetto <tom@tailscale.com>

* wgengine/netstack: intercept only relevant port/protocols to quad-100

Signed-off-by: Tom DNetto <tom@tailscale.com>
3 years ago
Tom DNetto 2a0b5c21d2 net/dns/{., resolver}, wgengine: fix goroutine leak on shutdown
Signed-off-by: Tom DNetto <tom@tailscale.com>
3 years ago
Tom DNetto 7f45734663 assorted: documentation and readability fixes
This were intended to be pushed to #4408, but in my excitement I
forgot to git push :/ better late than never.

Signed-off-by: Tom DNetto <tom@tailscale.com>
3 years ago
Tom DNetto 5b85f848dd net/dns,net/dns/resolver: refactor channels/magicDNS out of Resolver
Moves magicDNS-specific handling out of Resolver & into dns.Manager. This
greatly simplifies the Resolver to solely issuing queries and returning
responses, without channels.

Enforcement of max number of in-flight magicDNS queries, assembly of
synthetic UDP datagrams, and integration with wgengine for
recieving/responding to magicDNS traffic is now entirely in Manager.
This path is being kept around, but ultimately aims to be deleted and
replaced with a netstack-based path.

This commit is part of a series to implement magicDNS using netstack.

Signed-off-by: Tom DNetto <tom@tailscale.com>
3 years ago
Brad Fitzpatrick cc575fe4d6 net/dns: schedule DoH upgrade explicitly, fix Resolver.Addr confusion
Two changes in one:

* make DoH upgrades an explicitly scheduled send earlier, when we come
  up with the resolvers-and-delay send plan. Previously we were
  getting e.g.  four Google DNS IPs and then spreading them out in
  time (for back when we only did UDP) but then later we added DoH
  upgrading at the UDP packet layer, which resulted in sometimes
  multiple DoH queries to the same provider running (each doing happy
  eyeballs dialing to 4x IPs themselves) for each of the 4 source IPs.
  Instead, take those 4 Google/Cloudflare IPs and schedule 5 things:
  first the DoH query (which can use all 4 IPs), and then each of the
  4 IPs as UDP later.

* clean up the dnstype.Resolver.Addr confusion; half the code was
  using it as an IP string (as documented) as half was using it as
  an IP:port (from some prior type we used), primarily for tests.
  Instead, document it was being primarily an IP string but also
  accepting an IP:port for tests, then add an accessor method on it
  to get the IPPort and use that consistently everywhere.

Change-Id: Ifdd72b9e45433a5b9c029194d50db2b9f9217b53
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick d9efbd97cb net/dns: remove an unused function
Change-Id: I7c920c76223ffac37954ef2a18754afc52177598
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Tom 24bdcbe5c7
net/dns, net/dns/resolver, wgengine: refactor DNS request path (#4364)
* net/dns, net/dns/resolver, wgengine: refactor DNS request path

Previously, method calls into the DNS manager/resolver types handled DNS
requests rather than DNS packets. This is fine for UDP as one packet
corresponds to one request or response, however will not suit an
implementation that supports DNS over TCP.

To support PRs implementing this in the future, wgengine delegates
all handling/construction of packets to the magic DNS endpoint, to
the DNS types themselves. Handling IP packets at this level enables
future support for both UDP and TCP.

Signed-off-by: Tom DNetto <tom@tailscale.com>
3 years ago
Brad Fitzpatrick 506c727e30 ipnlocal, net/{dns,tsaddr,tstun}, wgengine: support MagicDNS on IPv6
Fixes #3660

RELNOTE=MagicDNS now works over IPv6 when CGNAT IPv4 is disabled.

Change-Id: I001e983df5feeb65289abe5012dedd177b841b45
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick c37af58ea4 net/tsdial: move more weirdo dialing into new tsdial package, plumb
Not done yet, but this move more of the outbound dial special casing
from random packages into tsdial, which aspires to be the one unified
place for all outbound dialing shenanigans.

Then this plumbs it all around, so everybody is ultimately
holding on to the same dialer.

As of this commit, macOS/iOS using an exit node should be able to
reach to the exit node's DoH DNS proxy over peerapi, doing the sockopt
to stay within the Network Extension.

A number of steps remain, including but limited to:

* move a bunch more random dialing stuff

* make netstack-mode tailscaled be able to use exit node's DNS proxy,
  teaching tsdial's resolver to use it when an exit node is in use.

Updates #1713

Change-Id: I1e8ee378f125421c2b816f47bc2c6d913ddcd2f5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 283ae702c1 ipn/ipnlocal: start adding DoH DNS server to peerapi when exit node
Updates #1713

Change-Id: I8d9c488f779e7acc811a9bc18166a2726198a429
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
David Anderson 6c82cebe57 health: add a health state for net/dns.OSConfigurator.
Lets the systemd-resolved OSConfigurator report health changes
for out of band config resyncs.

Updates #3327

Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
David Anderson cf9169e4be net/dns: remove unused Config struct element.
Signed-off-by: David Anderson <danderson@tailscale.com>
3 years ago
Denton Gentry 93c2882a2f wgengine: flush DNS cache after major link change.
Windows has a public dns.Flush used in router_windows.go.
However that won't work for platforms like Linux, where
we need a different flush mechanism for resolved versus
other implementations.

We're instead adding a FlushCaches method to the dns Manager,
which can be made to work on all platforms as needed.

Fixes https://github.com/tailscale/tailscale/issues/2132

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
3 years ago
David Crawshaw 9502b515f1 net/dns: replace resolver IPs with type for DoH
We currently plumb full URLs for DNS resolvers from the control server
down to the client. But when we pass the values into the net/dns
package, we throw away any URL that isn't a bare IP. This commit
continues the plumbing, and gets the URL all the way to the built in
forwarder. (It stops before plumbing URLs into the OS configurations
that can handle them.)

For #2596

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
3 years ago
Brad Fitzpatrick e74d37d30f net/dns{,/resolver}: quiet DNS output logging
It was a huge chunk of the overall log output and made debugging
difficult. Omit and summarize the spammy *.arpa parts instead.

Fixes tailscale/corp#2066 (to which nobody had opinions, so)
3 years ago