Commit Graph

84 Commits (c2e474e729b4b665cf5acafa29f89c11af71ac35)

Author SHA1 Message Date
Alex Chan c2e474e729 all: rename variables with lowercase-l/uppercase-I
See http://go/no-ell

Signed-off-by: Alex Chan <alexc@tailscale.com>

Updates #cleanup

Change-Id: I8c976b51ce7a60f06315048b1920516129cc1d5d
2 weeks ago
M. J. Fromberger 95426b79a9
logtail: avoid racing eventbus subscriptions with shutdown (#17695)
In #17639 we moved the subscription into NewLogger to ensure we would not race
subscribing with shutdown of the eventbus client. Doing so fixed that problem,
but exposed another: As we were only servicing events occasionally when waiting
for the network to come up, we could leave the eventbus to stall in cases where
a number of network deltas arrived later and weren't processed.

To address that, let's separate the concerns: As before, we'll Subscribe early
to avoid conflicts with shutdown; but instead of using the subscriber directly
to determine readiness, we'll keep track of the last-known network state in a
selectable condition that the subscriber updates for us.  When we want to wait,
we'll wait on that condition (or until our context ends), ensuring all the
events get processed in a timely manner.

Updates #17638
Updates #15160

Change-Id: I28339a372be4ab24be46e2834a218874c33a0d2d
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
1 month ago
M. J. Fromberger db5815fb97
Revert "logtail: avoid racing eventbus subscriptions with Shutdown (#17639)" (#17684)
This reverts commit 4346615d77.
We averted the shutdown race, but will need to service the subscriber even when
we are not waiting for a change so that we do not delay the bus as a whole.

Updates #17638

Change-Id: I5488466ed83f5ad1141c95267f5ae54878a24657
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
1 month ago
M. J. Fromberger 4346615d77
logtail: avoid racing eventbus subscriptions with Shutdown (#17639)
When the eventbus is enabled, set up the subscription for change deltas at the
beginning when the client is created, rather than waiting for the first
awaitInternetUp check.

Otherwise, it is possible for a check to race with the client close in
Shutdown, which triggers a panic.

Updates #17638

Change-Id: I461c07939eca46699072b14b1814ecf28eec750c
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
1 month ago
Claus Lensbøl ce752b8a88
net/netmon: remove usage of direct callbacks from netmon (#17292)
The callback itself is not removed as it is used in other repos, making
it simpler for those to slowly transition to the eventbus.

Updates #15160

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2 months ago
Brad Fitzpatrick 11b770fbc9 feature/logtail: pull logtail + netlog out to modular features
Removes 434 KB from the minimal Linux binary, or ~3%.

Primarily this comes from not linking in the zstd encoding code.

Fixes #17323

Change-Id: I0a90de307dfa1ad7422db7aa8b1b46c782bfaaf7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 months ago
Joe Tsai f19409482d logtail: delete AppendTextOrJSONLocked
This was accidentally added in #11671 for testing.
Nothing uses it.

Updates tailscale/corp#21363

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 months ago
Brad Fitzpatrick 4fa9411e3f logtail: remove unneeded IP redaction code
Updates tailscale/corp#15664

Change-Id: I9523a43860685048548890cf1931ee6cbd60452c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
7 months ago
Joe Tsai 0b7087c401
logpolicy: expose MaxBufferSize and MaxUploadSize options (#14903)
Updates tailscale/corp#26342

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
10 months ago
Joe Tsai b62a013ecb
Switch logging service from log.tailscale.io to log.tailscale.com (#14398)
Updates tailscale/corp#23617

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
12 months ago
Joe Tsai bac3af06f5
logtail: avoid bytes.Buffer allocation (#11858)
Re-use a pre-allocated bytes.Buffer struct and
shallow the copy the result of bytes.NewBuffer into it
to avoid allocating the struct.

Note that we're only reusing the bytes.Buffer struct itself
and not the underling []byte temporarily stored within it.

Updates #cleanup
Updates tailscale/corp#18514
Updates golang/go#67004

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
1 year ago
Anton Tolchanov 7403d8e9a8 logtail: close idle HTTP connections on shutdown
Fixes tailscale/corp#21609

Co-authored-by: Maisem Ali <maisem@tailscale.com>
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
1 year ago
Maisem Ali 4a8cb1d9f3 all: use math/rand/v2 more
Updates #11058

Signed-off-by: Maisem Ali <maisem@tailscale.com>
2 years ago
Brad Fitzpatrick 3672f29a4e net/netns, net/dns/resolver, etc: make netmon required in most places
The goal is to move more network state accessors to netmon.Monitor
where they can be cheaper/cached. But first (this change and others)
we need to make sure the one netmon.Monitor is plumbed everywhere.

Some notable bits:

* tsdial.NewDialer is added, taking a now-required netmon

* because a tsdial.Dialer always has a netmon, anything taking both
  a Dialer and a NetMon is now redundant; take only the Dialer and
  get the NetMon from that if/when needed.

* netmon.NewStatic is added, primarily for tests

Updates tailscale/corp#10910
Updates tailscale/corp#18960
Updates #7967
Updates #3299

Change-Id: I877f9cb87618c4eb037cee098241d18da9c01691
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Joe Tsai 7a77a2edf1
logtail: optimize JSON processing (#11671)
Changes made:

* Avoid "encoding/json" for JSON processing, and instead use
"github.com/go-json-experiment/json/jsontext".
Use jsontext.Value.IsValid for validation, which is much faster.
Use jsontext.AppendQuote instead of our own JSON escaping.

* In drainPending, use a different maxLen depending on lowMem.
In lowMem mode, it is better to perform multiple uploads
than it is to construct a large body that OOMs the process.

* In drainPending, if an error is encountered draining,
construct an error message in the logtail JSON format
rather than something that is invalid JSON.

* In appendTextOrJSONLocked, use jsontext.Decoder to check
whether the input is a valid JSON object. This is faster than
the previous approach of unmarshaling into map[string]any and
then re-marshaling that data structure.
This is especially beneficial for network flow logging,
which produces relatively large JSON objects.

* In appendTextOrJSONLocked, enforce maxSize on the input.
If too large, then we may end up in a situation where the logs
can never be uploaded because it exceeds the maximum body size
that the Tailscale logs service accepts.

* Use "tailscale.com/util/truncate" to properly truncate a string
on valid UTF-8 boundaries.

* In general, remove unnecessary spaces in JSON output.

Performance:

    name       old time/op    new time/op    delta
    WriteText     776ns ± 2%     596ns ± 1%   -23.24%  (p=0.000 n=10+10)
    WriteJSON     110µs ± 0%       9µs ± 0%   -91.77%  (p=0.000 n=8+8)

    name       old alloc/op   new alloc/op   delta
    WriteText      448B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
    WriteJSON    37.9kB ± 0%     0.0kB ± 0%   -99.87%  (p=0.000 n=10+10)

    name       old allocs/op  new allocs/op  delta
    WriteText      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
    WriteJSON     1.08k ± 0%     0.00k ± 0%   -99.91%  (p=0.000 n=10+10)

For text payloads, this is 1.30x faster.
For JSON payloads, this is 12.2x faster.

Updates #cleanup
Updates tailscale/corp#18514

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Brad Fitzpatrick ec87e219ae logtail: delete unused code from old way to configure zstd
Updates #cleanup

Change-Id: I666ecf08ea67e461adf2a3f4daa9d1753b2dc1e4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Joe Tsai e2586bc674
logtail: always zstd compress with FastestCompression and LowMemory (#11583)
This is based on empirical testing using actual logs data.

FastestCompression only incurs a marginal <1% compression ratio hit
for a 2.25x reduction in memory use for small payloads
(which are common if log uploads happen at a decently high frequency).
The memory savings for large payloads is much lower
(less than 1.1x reduction).

LowMemory only incurs a marginal <5% hit on performance
for a 1.6-2.0x reduction in memory use for small or large payloads.

The memory gains for both settings justifies the loss of benefits,
which are arguably minimal.

tailscale/corp#18514

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Brad Fitzpatrick e7599c1f7e logtail: prevent js/wasm clients from picking TLS client cert
Corp details:
https://github.com/tailscale/corp/issues/18177#issuecomment-2026598715
https://github.com/tailscale/corp/pull/18775#issuecomment-2027505036

Updates tailscale/corp#18177

Change-Id: I7c03a4884540b8519e0996088d085af77991f477
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 6d90966c1f logtail: move a scratch buffer to Logger
Rather than pass around a scratch buffer, put it on the Logger.

This is a baby step towards removing the background uploading
goroutine and starting it as needed.

Updates tailscale/corp#18514 (insofar as it led me to look at this code)

Change-Id: I6fd94581c28bde40fdb9fca788eb9590bcedae1b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Joe Tsai 85febda86d
all: use zstdframe where sensible (#11491)
Use the zstdframe package where sensible instead of plumbing
around our own zstd.Encoder just for stateless operations.

This causes logtail to have a dependency on zstd,
but that's arguably okay since zstd support is implicit
to the protocol between a client and the logging service.
Also, virtually every caller to logger.NewLogger was
manually setting up a zstd.Encoder anyways,
meaning that zstd was functionally always a dependency.

Updates #cleanup
Updates tailscale/corp#18514

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
as2643 3fb6ee7fdb
tailscale/logtail: redact public ipv6 and ipv4 ip addresses within tailscaled. (#10531)
Updates #15664

Signed-off-by: Anishka Singh <anishkasingh66@gmail.com>
2 years ago
Brad Fitzpatrick d852c616c6 logtail: fix Logger.Write return result
io.Writer says you need to write completely on err=nil. (the result
int should be the same as the input buffer length)

We weren't doing that. We used to, but at some point the verbose
filtering was modifying buf before the final return of len(buf).

We've been getting lucky probably, that callers haven't looked at our
results and turned us into a short write error.

Updates #cleanup
Updates tailscale/corp#15664

Change-Id: I01e765ba35b86b759819e38e0072eceb9d10d75c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 9089efea06 net/netmon: make ChangeFunc's signature take new ChangeDelta, not bool
Updates #9040

Change-Id: Ia43752064a1a6ecefc8802b58d6eaa0b71cf1f84
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Claire Wang e1bcecc393
logtail: use tstime (#8607)
Updates #8587
Signed-off-by: Claire Wang <claire@tailscale.com>
2 years ago
Joe Tsai 49015b00fe
logtail: fix race condition with sockstats label (#8578)
Updates tailscale/corp#8427

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 84c99fe0d9
logtail: be less aggressive about re-uploads (#8117)
The retry logic was pathological in the following ways:

* If we restarted the logging service, any pending uploads
would be placed in a retry-loop where it depended on backoff.Backoff,
which was too aggresive. It would retry failures within milliseconds,
taking at least 10 retries to hit a delay of 1 second.

* In the event where a logstream was rate limited,
the aggressive retry logic would severely exacerbate the problem
since each retry would also log an error message.
It is by chance that the rate of log error spam
does not happen to exceed the rate limit itself.

We modify the retry logic in the following ways:

* We now respect the "Retry-After" header sent by the logging service.

* Lacking a "Retry-After" header, we retry after a hard-coded period of
30 to 60 seconds. This avoids the thundering-herd effect when all nodes
try reconnecting to the logging service at the same time after a restart.

* We do not treat a status 400 as having been uploaded.
This is simply not the behavior of the logging service.

Updates #tailscale/corp#11213

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Mihai Parparita 4722f7e322 all: move network monitoring from wgengine/monitor to net/netmon
We're using it in more and more places, and it's not really specific to
our use of Wireguard (and does more just link/interface monitoring).

Also removes the separate interface we had for it in sockstats -- it's
a small enough package (we already pull in all of its dependencies
via other paths) that it's not worth the extra complexity.

Updates #7621
Updates #7850

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
3 years ago
Mihai Parparita edb02b63f8 net/sockstats: pass in logger to sockstats.WithSockStats
Using log.Printf may end up being printed out to the console, which
is not desirable. I noticed this when I was investigating some client
logs with `sockstats: trace "NetcheckClient" was overwritten by another`.
That turns to be harmless/expected (the netcheck client will fall back
to the DERP client in some cases, which does its own sockstats trace).

However, the log output could be visible to users if running the
`tailscale netcheck` CLI command, which would be needlessly confusing.

Updates tailscale/corp#9230

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
3 years ago
Will Norris 62a1e9a44f log/sockstatlog: add delay before writing logs to disk
Split apart polling of sockstats and logging them to disk.  Add a 3
second delay before writing logs to disk to prevent an infinite upload
loop when uploading stats to logcatcher.

Fixes #7719

Signed-off-by: Will Norris <will@tailscale.com>
3 years ago
Mihai Parparita a2be1aabfa logtail: remove unncessary response read
Effectively reverts #249, since the server side was fixed (with #251?)
to send a 200 OK/content-length 0 response.

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
3 years ago
Mihai Parparita 6ac6ddbb47 sockstats: switch label to enum
Makes it cheaper/simpler to persist values, and encourages reuse of
labels as opposed to generating an arbitrary number.

Updates tailscale/corp#9230
Updates #3363

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
3 years ago
Joe Tsai 7e4788e383
logtail: delete ID types and functions (#7412)
These have been moved to the types/logid package.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Mihai Parparita 9cb332f0e2 sockstats: instrument networking code paths
Uses the hooks added by tailscale/go#45 to instrument the reads and
writes on the major code paths that do network I/O in the client. The
convention is to use "<package>.<type>:<label>" as the annotation for
the responsible code path.

Enabled on iOS, macOS and Android only, since mobile platforms are the
ones we're most interested in, and we are less sensitive to any
throughput degradation due to the per-I/O callback overhead (macOS is
also enabled for ease of testing during development).

For now just exposed as counters on a /v0/sockstats PeerAPI endpoint.

We also keep track of the current interface so that we can break out
the stats by interface.

Updates tailscale/corp#9230
Updates #3363

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
3 years ago
David Crawshaw 46467e39c2 logtail: allow multiple calls to Shutdown
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
3 years ago
Mihai Parparita f0f2b2e22b logtail: increase maximum log line size in low memory mode
The 255 byte limit was chosen more than 3 years ago (tailscale/corp@929635c9d9),
when iOS was operating under much more significant memory constraints.
With iOS 15 the network extension has an increased limit, so increasing
it to 4K should be fine.

The motivating factor was that the network interfaces being logged
by linkChange in wgengine/userspace.go were getting truncated, and it
would be useful to know why in some cases we're choosing the pdp_ip1
cell interface instead of the pdp_ip0 one.

Updates #7184
Updates #7188

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
3 years ago
Will Norris 71029cea2d all: update copyright and license headers
This updates all source files to use a new standard header for copyright
and license declaration.  Notably, copyright no longer includes a date,
and we now use the standard SPDX-License-Identifier header.

This commit was done almost entirely mechanically with perl, and then
some minimal manual fixes.

Updates #6865

Signed-off-by: Will Norris <will@tailscale.com>
3 years ago
Brad Fitzpatrick b657187a69 cmd/tailscale, logtail: add 'tailscale debug daemon-logs' logtail mechanism
Fixes #6836

Change-Id: Ia6eb39ff8972e1aa149aeeb63844a97497c2cf04
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 69c0b7e712 ipn/ipnlocal: add c2n handler to flush logtail for support debugging
Updates tailscale/corp#8564

Change-Id: I0c619d4007069f90cffd319fba66bd034d63e84d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Mihai Parparita 8f2bc0708b logtail: make logs flush delay dynamic
Instead of a static FlushDelay configuration value, use a FlushDelayFn
function that we invoke every time we decide send logs. This will allow
mobile clients to be more dynamic about when to send logs.

Updates #6768

Signed-off-by: Mihai Parparita <mihai@tailscale.com>
3 years ago
Brad Fitzpatrick a04f1ff9e6 logtail: default to 2s log flush delay on all platforms
Per chat. This is close enough to realtime but massively reduces
number of HTTP requests. (which you can verify with
TS_DEBUG_LOGTAIL_WAKES and watching tailscaled run at start)

By contrast, this is set to 2 minutes on mobile.

Change-Id: Id737c7924d452de5c446df3961f5e94a43a33f1f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick a315336287 logtail: change batched upload mechanism to not use CPU when idle
The mobile implementation had a 2 minute ticker going all the time
to do a channel send. Instead, schedule it as needed based on activity.

Then we can be actually idle for long periods of time.

Updates #3363

Change-Id: I0dba4150ea7b94f74382fbd10db54a82f7ef6c29
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Joe Tsai 3af0d4d0f2
logtail: always record timestamps in UTC (#5732)
Upstream optimizations to the Go time package will make
unmarshaling of time.Time 3-6x faster. See:
* https://go.dev/cl/425116
* https://go.dev/cl/425197
* https://go.dev/cl/429862

The last optimization avoids a []byte -> string allocation
if the timestamp string less than than 32B.
Unfortunately, the presence of a timezone breaks that optimization.
Drop recording of timezone as this is non-essential information.

Most of the performance gains is upon unmarshal,
but there is also a slight performance benefit to
not marshaling the timezone as well.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Joe Tsai c321363d2c
logtail: support a copy ID (#5851)
The copy ID operates similar to a CC in email where
a message is sent to both the primary ID and also the copy ID.
A given log message is uploaded once, but the log server
records it twice for each ID.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Josh Soref d4811f11a0 all: fix spelling mistakes
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
3 years ago
Eng Zer Jun f0347e841f refactor: move from io/ioutil to io and os packages
The io/ioutil package has been deprecated as of Go 1.16 [1]. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Reference: https://golang.org/doc/go1.16#ioutil
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
3 years ago
Joe Tsai 21cd402204
logtail: do not log when backing off (#5485) 3 years ago
Brad Fitzpatrick 4950fe60bd syncs, all: move to using Go's new atomic types instead of ours
Fixes #5185

Change-Id: I850dd532559af78c3895e2924f8237ccc328449d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 5381437664 logtail, net/portmapper, wgengine/magicsock: use fmt.Appendf
Fixes #5206

Change-Id: I490bb92e774ce7c044040537e2cd864fcf1dbe5a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 48e73e147a logtail,logpolicy: tweak minor cosmetic things
Just reading the code again in prep for some alloc reductions.

Change-Id: I065226ea794b7ec7144c2b15942d35131c9313a8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 02e580c1d2 logtail: use http.NewRequestWithContext
Saves some allocs. Not hot, but because we can now.

And a const instead of a var.

Change-Id: Ieb2b64534ed38051c36b2c0aa2e82739d9d0e015
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years ago