You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
tailscale/util/zstdframe/zstd.go

128 lines
4.6 KiB
Go

util/zstdframe: add package for stateless zstd compression (#11481) The Go zstd package is not friendly for stateless zstd compression. Passing around multiple zstd.Encoder just for stateless compression is a waste of memory since the memory is never freed and seldom used if no compression operations are happening. For performance, we pool the relevant Encoder/Decoder with the specific options set. Functionally, this package is a wrapper over the Go zstd package with a more ergonomic API for stateless operations. This package can be used to cleanup various pre-existing zstd.Encoder pools or one-off handlers spread throughout our codebases. Performance: BenchmarkEncode/Best 1690 610926 ns/op 25.78 MB/s 1 B/op 0 allocs/op zstd_test.go:137: memory: 50.336 MiB zstd_test.go:138: ratio: 3.269x BenchmarkEncode/Better 10000 100939 ns/op 156.04 MB/s 0 B/op 0 allocs/op zstd_test.go:137: memory: 20.399 MiB zstd_test.go:138: ratio: 3.131x BenchmarkEncode/Default 15775 74976 ns/op 210.08 MB/s 105 B/op 0 allocs/op zstd_test.go:137: memory: 1.586 MiB zstd_test.go:138: ratio: 3.064x BenchmarkEncode/Fastest 23222 53977 ns/op 291.81 MB/s 26 B/op 0 allocs/op zstd_test.go:137: memory: 599.458 KiB zstd_test.go:138: ratio: 2.898x BenchmarkEncode/FastestLowMemory 23361 50789 ns/op 310.13 MB/s 15 B/op 0 allocs/op zstd_test.go:137: memory: 334.458 KiB zstd_test.go:138: ratio: 2.898x BenchmarkEncode/FastestNoChecksum 23086 50253 ns/op 313.44 MB/s 26 B/op 0 allocs/op zstd_test.go:137: memory: 599.458 KiB zstd_test.go:138: ratio: 2.900x BenchmarkDecode/Checksum 70794 17082 ns/op 300.96 MB/s 4 B/op 0 allocs/op zstd_test.go:163: memory: 316.438 KiB BenchmarkDecode/NoChecksum 74935 15990 ns/op 321.51 MB/s 4 B/op 0 allocs/op zstd_test.go:163: memory: 316.438 KiB BenchmarkDecode/LowMemory 71043 16739 ns/op 307.13 MB/s 0 B/op 0 allocs/op zstd_test.go:163: memory: 79.347 KiB We can see that the options are taking effect where compression ratio improves with higher levels and compression speed diminishes. We can also see that LowMemory takes effect where the pooled coder object references less memory than other cases. We can see that the pooling is taking effect as there are 0 amortized allocations. Additional performance: BenchmarkEncodeParallel/zstd-24 1857 619264 ns/op 1796 B/op 49 allocs/op BenchmarkEncodeParallel/zstdframe-24 1954 532023 ns/op 4293 B/op 49 allocs/op BenchmarkDecodeParallel/zstd-24 5288 197281 ns/op 2516 B/op 49 allocs/op BenchmarkDecodeParallel/zstdframe-24 6441 196254 ns/op 2513 B/op 49 allocs/op In concurrent usage, handling the pooling in this package has a marginal benefit over the zstd package, which relies on a Go channel as the pooling mechanism. In particular, coders can be freed by the GC when not in use. Coders can be shared throughout the program if they use this package instead of multiple independent pools doing the same thing. The allocations are unrelated to pooling as they're caused by the spawning of goroutines. Updates #cleanup Updates tailscale/corp#18514 Updates tailscale/corp#17653 Updates tailscale/corp#18005 Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 months ago
// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause
// Package zstdframe provides functionality for encoding and decoding
// independently compressed zstandard frames.
package zstdframe
import (
"encoding/binary"
"io"
"github.com/klauspost/compress/zstd"
)
// The Go zstd API surface is not ergonomic:
//
// - Options are set via NewReader and NewWriter and immutable once set.
//
// - Stateless operations like EncodeAll and DecodeAll are methods on
// the Encoder and Decoder types, which implies that options cannot be
// changed without allocating an entirely new Encoder or Decoder.
//
// This is further strange as Encoder and Decoder types are either
// stateful or stateless objects depending on semantic context.
//
// - By default, the zstd package tries to be overly clever by spawning off
// multiple goroutines to do work, which can lead to both excessive fanout
// of resources and also subtle race conditions. Also, each Encoder/Decoder
// never relinquish resources, which makes it unsuitable for lower memory.
// We work around the zstd defaults by setting concurrency=1 on each coder
// and pool individual coders, allowing the Go GC to reclaim unused coders.
//
// See https://github.com/klauspost/compress/issues/264
// See https://github.com/klauspost/compress/issues/479
//
// - The EncodeAll and DecodeAll functions appends to a user-provided buffer,
// but uses a signature opposite of most append-like functions in Go,
// where the output buffer is the second argument, leading to footguns.
// The zstdframe package provides AppendEncode and AppendDecode functions
// that follows Go convention of the first argument being the output buffer
// similar to how the builtin append function operates.
//
// See https://github.com/klauspost/compress/issues/648
//
// - The zstd package is oddly inconsistent about naming. For example,
// IgnoreChecksum vs WithEncoderCRC, or
// WithDecoderLowmem vs WithLowerEncoderMem.
// Most options have a WithDecoder or WithEncoder prefix, but some do not.
//
// The zstdframe package wraps the zstd package and presents a more ergonomic API
// by providing stateless functions that take in variadic options.
// Pooling of resources is handled by this package to avoid each caller
// redundantly performing the same pooling at different call sites.
// TODO: Since compression is CPU bound,
// should we have a semaphore ensure at most one operation per CPU?
// AppendEncode appends the zstandard encoded content of src to dst.
// It emits exactly one frame as a single segment.
func AppendEncode(dst, src []byte, opts ...Option) []byte {
enc := getEncoder(opts...)
defer putEncoder(enc)
return enc.EncodeAll(src, dst)
}
// AppendDecode appends the zstandard decoded content of src to dst.
// The input may consist of zero or more frames.
// Any call that handles untrusted input should specify [MaxDecodedSize].
func AppendDecode(dst, src []byte, opts ...Option) ([]byte, error) {
dec := getDecoder(opts...)
defer putDecoder(dec)
return dec.DecodeAll(src, dst)
}
// NextSize parses the next frame (regardless of whether it is a
// data frame or a metadata frame) and returns the total size of the frame.
// The frame can be skipped by slicing n bytes from b (e.g., b[n:]).
// It report [io.ErrUnexpectedEOF] if the frame is incomplete.
func NextSize(b []byte) (n int, err error) {
// Parse the frame header (RFC 8878, section 3.1.1.).
var frame zstd.Header
if err := frame.Decode(b); err != nil {
return n, err
}
n += frame.HeaderSize
if frame.Skippable {
// Handle skippable frame (RFC 8878, section 3.1.2.).
if len(b[n:]) < int(frame.SkippableSize) {
return n, io.ErrUnexpectedEOF
}
n += int(frame.SkippableSize)
} else {
// Handle one or more Data_Blocks (RFC 8878, section 3.1.1.2.).
for {
if len(b[n:]) < 3 {
return n, io.ErrUnexpectedEOF
}
blockHeader := binary.LittleEndian.Uint32(b[n-1:]) >> 8 // load uint24
lastBlock := (blockHeader >> 0) & ((1 << 1) - 1)
blockType := (blockHeader >> 1) & ((1 << 2) - 1)
blockSize := (blockHeader >> 3) & ((1 << 21) - 1)
n += 3
if blockType == 1 {
// For RLE_Block (RFC 8878, section 3.1.1.2.2.),
// the Block_Content is only a single byte.
blockSize = 1
}
if len(b[n:]) < int(blockSize) {
return n, io.ErrUnexpectedEOF
}
n += int(blockSize)
if lastBlock != 0 {
break
}
}
// Handle optional Content_Checksum (RFC 8878, section 3.1.1.).
if frame.HasCheckSum {
if len(b[n:]) < 4 {
return n, io.ErrUnexpectedEOF
}
n += 4
}
}
return n, nil
}