Commit Graph

26 Commits (017dcd520f3cb485faea037b2a6e9fad438afa43)

Author SHA1 Message Date
Josh Bleecher Snyder 1271e135cd wgengine/tstun: initialize wireguard-go TUN parameters
This will enable us to remove the corresponding code from
our fork of wireguard-go.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years ago
David Anderson cb96b14bf4 net/packet: remove the custom IP4/IP6 types in favor of netaddr.IP.
Upstream netaddr has a change that makes it alloc-free, so it's safe to
use in hot codepaths. This gets rid of one of the many IP types in our
codebase.

Performance is currently worse across the board. This is likely due in
part to netaddr.IP being a larger value type (4b -> 24b for IPv4,
16b -> 24b for IPv6), and in other part due to missing low-hanging fruit
optimizations in netaddr. However, the regression is less bad than
it looks at first glance, because we'd micro-optimized packet.IP* in
the past few weeks. This change drops us back to roughly where we
were at the 1.2 release, but with the benefit of a significant
code and architectural simplification.

name                   old time/op    new time/op    delta
pkg:tailscale.com/net/packet goos:linux goarch:amd64
Decode/tcp4-8            12.2ns ± 5%    29.7ns ± 2%  +142.32%  (p=0.008 n=5+5)
Decode/tcp6-8            12.6ns ± 3%    65.1ns ± 2%  +418.47%  (p=0.008 n=5+5)
Decode/udp4-8            11.8ns ± 3%    30.5ns ± 2%  +157.94%  (p=0.008 n=5+5)
Decode/udp6-8            27.1ns ± 1%    65.7ns ± 2%  +142.36%  (p=0.016 n=4+5)
Decode/icmp4-8           24.6ns ± 2%    30.5ns ± 2%   +23.65%  (p=0.016 n=4+5)
Decode/icmp6-8           22.9ns ±51%    65.5ns ± 2%  +186.19%  (p=0.008 n=5+5)
Decode/igmp-8            18.1ns ±44%    30.2ns ± 1%   +66.89%  (p=0.008 n=5+5)
Decode/unknown-8         20.8ns ± 1%    10.6ns ± 9%   -49.11%  (p=0.016 n=4+5)
pkg:tailscale.com/wgengine/filter goos:linux goarch:amd64
Filter/icmp4-8           30.5ns ± 1%    77.9ns ± 3%  +155.01%  (p=0.008 n=5+5)
Filter/tcp4_syn_in-8     43.7ns ± 3%   123.0ns ± 3%  +181.72%  (p=0.008 n=5+5)
Filter/tcp4_syn_out-8    24.5ns ± 2%    45.7ns ± 6%   +86.22%  (p=0.008 n=5+5)
Filter/udp4_in-8         64.8ns ± 1%   210.0ns ± 2%  +223.87%  (p=0.008 n=5+5)
Filter/udp4_out-8         119ns ± 0%     278ns ± 0%  +133.78%  (p=0.016 n=4+5)
Filter/icmp6-8           40.3ns ± 2%   204.4ns ± 4%  +407.70%  (p=0.008 n=5+5)
Filter/tcp6_syn_in-8     35.3ns ± 3%   199.2ns ± 2%  +464.95%  (p=0.008 n=5+5)
Filter/tcp6_syn_out-8    32.8ns ± 2%    81.0ns ± 2%  +147.10%  (p=0.008 n=5+5)
Filter/udp6_in-8          106ns ± 2%     290ns ± 2%  +174.48%  (p=0.008 n=5+5)
Filter/udp6_out-8         184ns ± 2%     314ns ± 3%   +70.43%  (p=0.016 n=4+5)
pkg:tailscale.com/wgengine/tstun goos:linux goarch:amd64
Write-8                  9.02ns ± 3%    8.92ns ± 1%      ~     (p=0.421 n=5+5)

name                   old alloc/op   new alloc/op   delta
pkg:tailscale.com/net/packet goos:linux goarch:amd64
Decode/tcp4-8             0.00B          0.00B           ~     (all equal)
Decode/tcp6-8             0.00B          0.00B           ~     (all equal)
Decode/udp4-8             0.00B          0.00B           ~     (all equal)
Decode/udp6-8             0.00B          0.00B           ~     (all equal)
Decode/icmp4-8            0.00B          0.00B           ~     (all equal)
Decode/icmp6-8            0.00B          0.00B           ~     (all equal)
Decode/igmp-8             0.00B          0.00B           ~     (all equal)
Decode/unknown-8          0.00B          0.00B           ~     (all equal)
pkg:tailscale.com/wgengine/filter goos:linux goarch:amd64
Filter/icmp4-8            0.00B          0.00B           ~     (all equal)
Filter/tcp4_syn_in-8      0.00B          0.00B           ~     (all equal)
Filter/tcp4_syn_out-8     0.00B          0.00B           ~     (all equal)
Filter/udp4_in-8          0.00B          0.00B           ~     (all equal)
Filter/udp4_out-8         16.0B ± 0%     64.0B ± 0%  +300.00%  (p=0.008 n=5+5)
Filter/icmp6-8            0.00B          0.00B           ~     (all equal)
Filter/tcp6_syn_in-8      0.00B          0.00B           ~     (all equal)
Filter/tcp6_syn_out-8     0.00B          0.00B           ~     (all equal)
Filter/udp6_in-8          0.00B          0.00B           ~     (all equal)
Filter/udp6_out-8         48.0B ± 0%     64.0B ± 0%   +33.33%  (p=0.008 n=5+5)

name                   old allocs/op  new allocs/op  delta
pkg:tailscale.com/net/packet goos:linux goarch:amd64
Decode/tcp4-8              0.00           0.00           ~     (all equal)
Decode/tcp6-8              0.00           0.00           ~     (all equal)
Decode/udp4-8              0.00           0.00           ~     (all equal)
Decode/udp6-8              0.00           0.00           ~     (all equal)
Decode/icmp4-8             0.00           0.00           ~     (all equal)
Decode/icmp6-8             0.00           0.00           ~     (all equal)
Decode/igmp-8              0.00           0.00           ~     (all equal)
Decode/unknown-8           0.00           0.00           ~     (all equal)
pkg:tailscale.com/wgengine/filter goos:linux goarch:amd64
Filter/icmp4-8             0.00           0.00           ~     (all equal)
Filter/tcp4_syn_in-8       0.00           0.00           ~     (all equal)
Filter/tcp4_syn_out-8      0.00           0.00           ~     (all equal)
Filter/udp4_in-8           0.00           0.00           ~     (all equal)
Filter/udp4_out-8          1.00 ± 0%      1.00 ± 0%      ~     (all equal)
Filter/icmp6-8             0.00           0.00           ~     (all equal)
Filter/tcp6_syn_in-8       0.00           0.00           ~     (all equal)
Filter/tcp6_syn_out-8      0.00           0.00           ~     (all equal)
Filter/udp6_in-8           0.00           0.00           ~     (all equal)
Filter/udp6_out-8          1.00 ± 0%      1.00 ± 0%      ~     (all equal)

Signed-off-by: David Anderson <danderson@tailscale.com>
4 years ago
David Anderson 891110e64c wgengine: expand lazy config to work with dual-stacked peers.
Lazy wg configuration now triggers if a peer has only endpoint
addresses (/32 for IPv4, /128 for IPv6). Subnet routers still
trigger eager configuration to avoid the need for a CIDR match
in the hot packet path.

Signed-off-by: David Anderson <danderson@tailscale.com>
4 years ago
David Anderson 89894c6930 net/packet: add IPv6 source and destination IPs to Parsed.
Signed-off-by: David Anderson <danderson@tailscale.com>
4 years ago
David Anderson 093431f5dd net/packet: s/ParsedPacket/Parsed/ to avoid package stuttering.
Signed-off-by: David Anderson <danderson@tailscale.com>
4 years ago
David Anderson 7a54910990 wgengine/filter: remove helper vars, mark NewAllowAll test-only.
Signed-off-by: David Anderson <danderson@tailscale.com>
4 years ago
David Anderson 76d99cf01a wgengine/filter: remove the Matches type.
It only served to obscure the underlying slice type without
adding much value.

Signed-off-by: David Anderson <danderson@tailscale.com>
4 years ago
David Anderson b3634f020d wgengine/filter: use netaddr types in public API.
We still use the packet.* alloc-free types in the data path, but
the compilation from netaddr to packet happens within the filter
package.

Signed-off-by: David Anderson <danderson@tailscale.com>
4 years ago
David Anderson 427bf2134f net/packet: rename from wgengine/packet.
Signed-off-by: David Anderson <danderson@tailscale.com>
4 years ago
David Anderson 19df6a2ee2 wgengine/packet: rename types to reflect their v4-only-ness, document.
Signed-off-by: David Anderson <danderson@tailscale.com>
4 years ago
Avery Pennarun 5041800ac6 wgengine/tstun/faketun: it's a null tunnel, not a loopback.
At some point faketun got implemented as a loopback (put a packet in
from wireguard, the same packet goes back to wireguard) which is not
useful. It's supposed to be an interface that just sinks all packets,
and then wgengine adds *only* and ICMP Echo responder as a layer on
top.

This caused extremely odd bugs on darwin, where the special case that
reinjects packets from local->local was filling the loopback channel
and creating an infinite loop (which became jammed since the reader and
writer were in the same goroutine).

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
4 years ago
Brad Fitzpatrick 01098f41d0 wgengine/tstun: fix typo in comment 4 years ago
Brad Fitzpatrick 64a24e796b wgengine/tstun: fix 32-bit alignment again 4 years ago
Brad Fitzpatrick abe095f036 wgengine/tstun: make Close safe for concurrent use 4 years ago
Brad Fitzpatrick 16a9cfe2f4 wgengine: configure wireguard peers lazily, as needed
wireguard-go uses 3 goroutines per peer (with reasonably large stacks
& buffers).

Rather than tell wireguard-go about all our peers, only tell it about
peers we're actively communicating with. That means we need hooks into
magicsock's packet receiving path and tstun's packet sending path to
lazily create a wireguard peer on demand from the network map.

This frees up lots of memory for iOS (where we have almost nothing
left for larger domains with many users).

We should ideally do this in wireguard-go itself one day, but that'd
be a pretty big change.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Brad Fitzpatrick a89d610a3d wgengine/tstun: move sync.Pool to package global
sync.Pools should almost always be packate globals, even though in this
case we only have exactly 1 TUN device anyway, so it matters less.
Still, it's unusual to see a Pool that's not a package global, so move it.
4 years ago
Brad Fitzpatrick 10ac066013 all: fix vet warnings 4 years ago
Dmytro Shynkevych 19867b2b6d
tstun: remove buggy-looking log line.
This log line looks buggy, even though lacking a filter is expected during bringup.
We already know if we forget to SetFilter: it breaks the magicsock test,
so no useful information is lost.

Resolves #559.

Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
4 years ago
Brad Fitzpatrick 724ad13fe1 wgengine/tstun: fix alignment of 64-bit atomic field
We had a test for it, but no 32-bit builder apparently. :(

Fixes #529
4 years ago
Brad Fitzpatrick 23e74a0f7a wgengine, magicsock, tstun: don't regularly STUN when idle (mobile only for now)
If there's been 5 minutes of inactivity, stop doing STUN lookups. That
means NAT mappings will expire, but they can resume later when there's
activity again.

We'll do this for all platforms later.

Updates tailscale/corp#320

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago
Dmytro Shynkevych 511840b1f6
tsdns: initial implementation of a Tailscale DNS resolver (#396)
Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
4 years ago
Dmytro Shynkevych 02231e968e
wgengine/tstun: add tests and benchmarks (#436)
Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
4 years ago
Dmytro Shynkevych 059b1d10bb
wgengine/packet: refactor and expose UDP header marshaling (#408)
Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
4 years ago
Dmytro Shynkevych 737124ef70 tstun: tolerate zero reads
Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
4 years ago
Dmytro Shynkevych 635f7b99f1 wgengine: pass tun.NativeDevice to router
Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
4 years ago
Dmytro Shynkevych 33b2f30cea
wgengine: wrap tun.Device to support filtering and packet injection (#358)
Right now, filtering and packet injection in wgengine depend
on a patch to wireguard-go that probably isn't suitable for upstreaming.

This need not be the case: wireguard-go/tun.Device is an interface.
For example, faketun.go implements it to mock a TUN device for testing.

This patch implements the same interface to provide filtering
and packet injection at the tunnel device level,
at which point the wireguard-go patch should no longer be necessary.

This patch has the following performance impact on i7-7500U @ 2.70GHz,
tested in the following namespace configuration:
┌────────────────┐    ┌─────────────────────────────────┐     ┌────────────────┐
│      $ns1      │    │               $ns0              │     │      $ns2      │
│    client0     │    │      tailcontrol, logcatcher    │     │     client1    │
│  ┌─────┐       │    │  ┌──────┐         ┌──────┐      │     │  ┌─────┐       │
│  │vethc│───────┼────┼──│vethrc│         │vethrs│──────┼─────┼──│veths│       │
│  ├─────┴─────┐ │    │  ├──────┴────┐    ├──────┴────┐ │     │  ├─────┴─────┐ │
│  │10.0.0.2/24│ │    │  │10.0.0.1/24│    │10.0.1.1/24│ │     │  │10.0.1.2/24│ │
│  └───────────┘ │    │  └───────────┘    └───────────┘ │     │  └───────────┘ │
└────────────────┘    └─────────────────────────────────┘     └────────────────┘
Before:
---------------------------------------------------
| TCP send               | UDP send               |
|------------------------|------------------------|
| 557.0 (±8.5) Mbits/sec | 3.03 (±0.02) Gbits/sec |
---------------------------------------------------
After:
---------------------------------------------------
| TCP send               | UDP send               |
|------------------------|------------------------|
| 544.8 (±1.6) Mbits/sec | 3.13 (±0.02) Gbits/sec |
---------------------------------------------------
The impact on receive performance is similar.

Signed-off-by: Dmytro Shynkevych <dmytro@tailscale.com>
4 years ago