Commit Graph

54 Commits (5a2e6a6f7d101a33738b3cdd7ed25327925e00ae)

Author SHA1 Message Date
Brad Fitzpatrick 4af22f3785 util/deephash: add IncludeFields, ExcludeFields HasherForType Options
Updates tailscale/corp#6198

Change-Id: Iafc18c5b947522cf07a42a56f35c0319cc7b1c94
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
1 year ago
Will Norris 71029cea2d all: update copyright and license headers
This updates all source files to use a new standard header for copyright
and license declaration.  Notably, copyright no longer includes a date,
and we now use the standard SPDX-License-Identifier header.

This commit was done almost entirely mechanically with perl, and then
some minimal manual fixes.

Updates #6865

Signed-off-by: Will Norris <will@tailscale.com>
2 years ago
Josh Soref d4811f11a0 all: fix spelling mistakes
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2 years ago
Andrew Dunham 4b996ad5e3
util/deephash: add AppendSum method (#5768)
This method can be used to obtain the hex-formatted deephash.Sum
instance without allocations.

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
2 years ago
Joe Tsai 7e40071571
util/deephash: handle slice edge-cases (#5471)
It is unclear whether the lack of checking nil-ness of slices
was an oversight or a deliberate feature.
Lacking a comment, the assumption is that this was an oversight.

Also, expand the logic to perform cycle detection for recursive slices.
We do this on a per-element basis since a slice is semantically
equivalent to a list of pointers.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 9bf13fc3d1
util/deephash: remove getTypeInfo (#5469)
Add a new lookupTypeHasher function that is just a cached front-end
around the makeTypeHasher function.
We do not need to worry about the recursive type cycle issue that
made getTypeInfo more complicated since makeTypeHasher
is not directly recursive. All calls to itself happen lazily
through a sync.Once upon first use.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai ab7e6f3f11
util/deephash: require pointer in API (#5467)
The entry logic of Hash has extra complexity to make sure
we always have an addressable value on hand.
If not, we heap allocate the input.
For this reason we document that there are performance benefits
to always providing a pointer.
Rather than documenting this, just enforce it through generics.

Also, delete the unused HasherForType function.
It's an interesting use of generics, but not well tested.
We can resurrect it from code history if there's a need for it.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai c5b1565337
util/deephash: move pointer and interface logic to separate function (#5465)
This helps pprof better identify which Go kinds take the most time
since the kind is always in the function name.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai d2e2d8438b
util/deephash: move map logic to separate function (#5464)
This helps pprof better identify which Go kinds take the most time
since the kind is always in the function name.

There is a minor adjustment where we hash the length of the map
to be more on the cautious side.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 23c3831ff9
util/deephash: coalesce struct logic (#5466)
Rather than having two copies []fieldInfo,
just maintain one and perform merging in the same pass.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 296b008b9f
util/deephash: move array and slice logic to separate function (#5463)
This helps pprof better identify which Go kinds take the most time
since the kind is always in the function name.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 31bf3874d6
util/deephash: use unsafe.Pointer instead of reflect.Value (#5459)
Use of reflect.Value.SetXXX panics if the provided argument was
obtained from an unexported struct field.
Instead, pass an unsafe.Pointer around and convert to a
reflect.Value when necessary (i.e., for maps and interfaces).
Converting from unsafe.Pointer to reflect.Value guarantees that
none of the read-only bits will be populated.

When running in race mode, we attach type information to the pointer
so that we can type check every pointer operation.
This also type-checks that direct memory hashing is within
the valid range of a struct value.

We add test cases that previously caused deephash to panic,
but now pass.

Performance:

	name              old time/op    new time/op    delta
	Hash              14.1µs ± 1%    14.1µs ± 1%    ~     (p=0.590 n=10+9)
	HashPacketFilter  2.53µs ± 2%    2.44µs ± 1%  -3.79%  (p=0.000 n=9+10)
	TailcfgNode       1.45µs ± 1%    1.43µs ± 0%  -1.36%  (p=0.000 n=9+9)
	HashArray         318ns ± 2%     318ns ± 2%    ~      (p=0.541 n=10+10)
	HashMapAcyclic    32.9µs ± 1%    31.6µs ± 1%  -4.16%  (p=0.000 n=10+9)

There is a slight performance gain due to the use of unsafe.Pointer
over reflect.Value methods. Also, passing an unsafe.Pointer (1 word)
on the stack is cheaper than passing a reflect.Value (3 words).

Performance gains are diminishing since SHA-256 hashing now dominates the runtime.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 70f9fc8c7a
util/deephash: rely on direct memory hashing for primitive kinds (#5457)
Rather than separate functions to hash each kind,
just rely on the fact that these are direct memory hashable,
thus simplifying the code.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 531ccca648
util/deephash: delete slow path (#5423)
Every implementation of typeHasherFunc always returns true,
which implies that the slow path is no longer executed.
Delete it.

h.hashValueWithType(v, ti, ...) is deleted as it is equivalent to:
	ti.hasher()(h, v)

h.hashValue(v, ...) is deleted as it is equivalent to:
	ti := getTypeInfo(v.Type())
	ti.hasher()(h, v)

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 3fc8683585
util/deephash: expand fast-path capabilities (#5404)
Add support for maps and interfaces to the fast path.
Add cycle-detection to the pointer handling logic.
This logic is mostly copied from the slow path.

A future commit will delete the slow path once
the fast path never falls back to the slow path.

Performance:

	name                 old time/op    new time/op    delta
	Hash-24                18.5µs ± 1%    14.9µs ± 2%  -19.52%  (p=0.000 n=10+10)
	HashPacketFilter-24    2.54µs ± 1%    2.60µs ± 1%   +2.19%  (p=0.000 n=10+10)
	HashMapAcyclic-24      31.6µs ± 1%    30.5µs ± 1%   -3.42%  (p=0.000 n=9+8)
	TailcfgNode-24         1.44µs ± 2%    1.43µs ± 1%     ~     (p=0.171 n=10+10)
	HashArray-24            324ns ± 1%     324ns ± 2%     ~     (p=0.425 n=9+9)

The additional cycle detection logic doesn't incur much slow down
since it only activates if a type is recursive, which does not apply
for any of the types that we care about.

There is a notable performance boost since we switch from the fath path
to the slow path less often. Most notably, a struct with a field that
could not be handled by the fast path would previously cause
the entire struct to go through the slow path.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai d32700c7b2
util/deephash: specialize for netip.Addr and drop AppendTo support (#5402)
There are 5 types that we care about that implement AppendTo:

	key.DiscoPublic
	key.NodePublic
	netip.Prefix
	netipx.IPRange
	netip.Addr

The key types are thin wrappers around [32]byte and are memory hashable.
The netip.Prefix and netipx.IPRange types are thin wrappers over netip.Addr
and are hashable by default if netip.Addr is hashable.
The netip.Addr type is the only one with a complex structure where
the default behavior of deephash does not hash it correctly due to the presence
of the intern.Value type.

Drop support for AppendTo and instead add specialized hashing for netip.Addr
that would be semantically equivalent to == on the netip.Addr values.

The AppendTo support was already broken prior to this change.
It was fully removed (intentionally or not) in #4870.
It was partially restored in #4858 for the fast path,
but still broken in the slow path.
Just drop support for it altogether.

This does mean we lack any ability for types to self-hash themselves.
In the future we can add support for types that implement:

	interface { DeepHash() Sum }

Test and fuzz cases were added for the relevant types that
used to rely on the AppendTo method.
FuzzAddr has been executed on 1 billion samples without issues.

Signed-off-by: Joe Tsai joetsai@digital-static.net
2 years ago
Joe Tsai 03f7e4e577
util/hashx: move from sha256x (#5388) 2 years ago
Joe Tsai f061d20c9d
util/sha256x: rename Hash as Block512 (#5351)
Rename Hash as Block512 to indicate that this is a general-purpose
hash.Hash for any algorithm that operates on 512-bit block sizes.

While we rename the package as hashx in this commit,
a subsequent commit will move the sha256x package to hashx.
This is done separately to avoid confusing git.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 44d62b65d0
util/deephash: move typeIsRecursive and canMemHash to types.go (#5386)
Also, rename canMemHash to typeIsMemHashable to be consistent.
There are zero changes to the semantics.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai d53eb6fa11
util/deephash: simplify typeIsRecursive (#5385)
Any type that is memory hashable must not be recursive since
there are definitely no pointers involved to make a cycle.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 23ec3c104a
util/deephash: remove unused stack slice in typeIsRecursive (#5363)
No operation ever reads from this variable.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai c200229f9e
util/deephash: simplify canMemHash (#5384)
Put the t.Size() == 0 check first since this is applicable in all cases.
Drop the last struct field conditional since this is covered by the
sumFieldSize check at the end.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 32a1a3d1c0
util/deephash: avoid variadic argument for Update (#5372)
Hashing []any is slow since hashing of interfaces is slow.
Hashing of interfaces is slow since we pessimistically assume
that cycles can occur through them and start cycle tracking.

Drop the variadic signature of Update and fix callers to pass in
an anonymous struct so that we are hashing concrete types
near the root of the value tree.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 548fa63e49
util/deephash: use binary encoding of time.Time (#5352)
Formatting a time.Time as RFC3339 is slow.
See https://go.dev/issue/54093

Now that we have efficient hashing of fixed-width integers,
just hash the time.Time as a binary value.

Performance:

	Hash-24                19.0µs ± 1%    18.6µs ± 1%   -2.03%  (p=0.000 n=10+9)
	TailcfgNode-24         1.79µs ± 1%    1.40µs ± 1%  -21.74%  (p=0.000 n=10+9)

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 1f7479466e
util/deephash: use sha256x (#5339)
Switch deephash to use sha256x.Hash.

We add sha256x.HashString to efficiently hash a string.
It uses unsafe under the hood to convert a string to a []byte.
We also modify sha256x.Hash to export the underlying hash.Hash
for testing purposes so that we can intercept all hash.Hash calls.

Performance:

	name                 old time/op    new time/op    delta
	Hash-24                19.8µs ± 1%    19.2µs ± 1%  -3.01%  (p=0.000 n=10+10)
	HashPacketFilter-24    2.61µs ± 0%    2.53µs ± 1%  -3.01%  (p=0.000 n=8+10)
	HashMapAcyclic-24      31.3µs ± 1%    29.8µs ± 0%  -4.80%  (p=0.000 n=10+9)
	TailcfgNode-24         1.83µs ± 1%    1.82µs ± 2%    ~     (p=0.305 n=10+10)
	HashArray-24            344ns ± 2%     323ns ± 1%  -6.02%  (p=0.000 n=9+10)

The performance gains is not as dramatic as sha256x over sha256 due to:
1. most of the hashing already occurring through the direct memory hashing logic, and
2. what does not go through direct memory hashing is slowed down by reflect.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 77a92f326d
util/deephash: avoid using sync.Pool for reflect.MapIter (#5333)
In Go 1.19, the reflect.Value.MapRange method uses "function outlining"
so that the allocation of reflect.MapIter is inlinable by the caller.
If the iterator doesn't escape the caller, it can be stack allocated.
See https://go.dev/cl/400675

Performance:

	name               old time/op    new time/op    delta
	HashMapAcyclic-24    31.9µs ± 2%    32.1µs ± 1%   ~     (p=0.075 n=10+10)

	name               old alloc/op   new alloc/op   delta
	HashMapAcyclic-24     0.00B          0.00B        ~     (all equal)

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Joe Tsai 539c5e44c5
util/deephash: always keep values addressable (#5328)
The logic of deephash is both simpler and easier to reason about
if values are always addressable.

In Go, the composite kinds are slices, arrays, maps, structs,
interfaces, pointers, channels, and functions,
where we define "composite" as a Go value that encapsulates
some other Go value (e.g., a map is a collection of key-value entries).

In the cases of pointers and slices, the sub-values are always addressable.

In the cases of arrays and structs, the sub-values are always addressable
if and only if the parent value is addressable.

In the case of maps and interfaces, the sub-values are never addressable.
To make them addressable, we need to copy them onto the heap.

For the purposes of deephash, we do not care about channels and functions.

For all non-composite kinds (e.g., strings and ints), they are only addressable
if obtained from one of the composite kinds that produce addressable values
(i.e., pointers, slices, addressable arrays, and addressable structs).
A non-addressible, non-composite kind can be made addressable by
allocating it on the heap, obtaining a pointer to it, and dereferencing it.

Thus, if we can ensure that values are addressable at the entry points,
and shallow copy sub-values whenever we encounter an interface or map,
then we can ensure that all values are always addressable and
assume such property throughout all the logic.

Performance:

	name                 old time/op    new time/op    delta
	Hash-24                21.5µs ± 1%    19.7µs ± 1%  -8.29%  (p=0.000 n=9+9)
	HashPacketFilter-24    2.61µs ± 1%    2.62µs ± 0%  +0.29%  (p=0.037 n=10+9)
	HashMapAcyclic-24      30.8µs ± 1%    30.9µs ± 1%    ~     (p=0.400 n=9+10)
	TailcfgNode-24         1.84µs ± 1%    1.84µs ± 2%    ~     (p=0.928 n=10+10)
	HashArray-24            324ns ± 2%     332ns ± 2%  +2.45%  (p=0.000 n=10+10)

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2 years ago
Brad Fitzpatrick 116f55ff66 all: gofmt for Go 1.19
Updates #5210

Change-Id: Ib02cd5e43d0a8db60c1f09755a8ac7b140b670be
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 04cf46a762 util/deephash: fix unexported time.Time hashing
Updates tailscale/corp#6311

Change-Id: I33cd7e4040966261c2f2eb3d32f29936aeb7f632
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 2a22ea3e83 util/deephash: generate type-specific hasher funcs
name                old time/op    new time/op    delta
Hash-8                71.1µs ± 2%    71.5µs ± 1%     ~     (p=0.114 n=9+8)
HashPacketFilter-8    8.39µs ± 1%    4.83µs ± 2%  -42.38%  (p=0.000 n=8+9)
HashMapAcyclic-8      56.2µs ± 1%    56.9µs ± 2%   +1.17%  (p=0.035 n=10+9)
TailcfgNode-8         6.49µs ± 2%    3.54µs ± 1%  -45.37%  (p=0.000 n=9+9)
HashArray-8            729ns ± 2%     566ns ± 3%  -22.30%  (p=0.000 n=10+10)

name                old alloc/op   new alloc/op   delta
Hash-8                 24.0B ± 0%     24.0B ± 0%     ~     (all equal)
HashPacketFilter-8     24.0B ± 0%     24.0B ± 0%     ~     (all equal)
HashMapAcyclic-8       0.00B          0.00B          ~     (all equal)
TailcfgNode-8          0.00B          0.00B          ~     (all equal)
HashArray-8            0.00B          0.00B          ~     (all equal)

name                old allocs/op  new allocs/op  delta
Hash-8                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
HashPacketFilter-8      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
HashMapAcyclic-8        0.00           0.00          ~     (all equal)
TailcfgNode-8           0.00           0.00          ~     (all equal)
HashArray-8             0.00           0.00          ~     (all equal)

Change-Id: I34c4e786e748fe60280646d40cc63a2adb2ea6fe
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 35782f891d util/deephash: add canMemHash func + typeInfo property
Currently unused. (breaking up a bigger change)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 8c5c87be26 util/deephash: fix collisions between different types
Updates #4883

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 757ecf7e80 util/deephash: fix map hashing when key & element have the same type
Regression from 09afb8e35b, in which the
same reflect.Value scratch value was being used as the map iterator
copy destination.

Also: make nil and empty maps hash differently, add test.

Fixes #4871

Co-authored-by: Josh Bleecher Snyder <josharian@gmail.com>
Change-Id: I67f42524bc81f694c1b7259d6682200125ea4a66
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick f31588786f util/deephash: don't track cycles on non-recursive types
name              old time/op    new time/op    delta
Hash-8              67.3µs ±20%    76.5µs ±16%     ~     (p=0.143 n=10+10)
HashMapAcyclic-8    63.0µs ± 2%    56.3µs ± 1%  -10.65%  (p=0.000 n=10+8)
TailcfgNode-8       9.18µs ± 2%    6.52µs ± 3%  -28.96%  (p=0.000 n=9+10)
HashArray-8          732ns ± 3%     709ns ± 1%   -3.21%  (p=0.000 n=10+10)

name              old alloc/op   new alloc/op   delta
Hash-8               24.0B ± 0%     24.0B ± 0%     ~     (all equal)
HashMapAcyclic-8     0.00B          0.00B          ~     (all equal)
TailcfgNode-8        0.00B          0.00B          ~     (all equal)
HashArray-8          0.00B          0.00B          ~     (all equal)

name              old allocs/op  new allocs/op  delta
Hash-8                1.00 ± 0%      1.00 ± 0%     ~     (all equal)
HashMapAcyclic-8      0.00           0.00          ~     (all equal)
TailcfgNode-8         0.00           0.00          ~     (all equal)
HashArray-8           0.00           0.00          ~     (all equal)

Change-Id: I28642050d837dff66b2db54b2b0e6d272a930be8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Brad Fitzpatrick 36ea837736 util/deephash: fix map hashing to actually hash elements
Fixes #4868

Change-Id: I574fd139cb7f7033dd93527344e6aa0e625477c7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2 years ago
Josh Bleecher Snyder 0868329936 all: use any instead of interface{}
My favorite part of generics.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
Josh Bleecher Snyder 97a01b7b17 util/deephash: remove Tailscale toolchain compatibility shim
The future is now.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
3 years ago
slowy07 ac0353e982 fix: typo spelling grammar
Signed-off-by: slowy07 <slowy.arfy@gmail.com>
3 years ago
Joe Tsai 9d0c86b6ec
util/deephash: remove unnecessary formatting for structs and slices (#2571)
The index for every struct field or slice element and
the number of fields for the struct is unncessary.

The hashing of Go values is unambiguous because every type (except maps)
encodes in a parsable manner. So long as we know the type information,
we could theoretically decode every value (except for maps).

At a high level:
* numbers are encoded as fixed-width records according to precision.
* strings (and AppendTo output) are encoded with a fixed-width length,
followed by the contents of the buffer.
* slices are prefixed by a fixed-width length, followed by the encoding
of each value. So long as we know the type of each element, we could
theoretically decode each element.
* arrays are encoded just like slices, but elide the length
since it is determined from the Go type.
* maps are encoded first with a byte indicating whether it is a cycle.
If a cycle, it is followed by a fixed-width index for the pointer,
otherwise followed by the SHA-256 hash of its contents. The encoding of maps
is not decodeable, but a SHA-256 hash is sufficient to avoid ambiguities.
* interfaces are encoded first with a byte indicating whether it is nil.
If not nil, it is followed by a fixed-width index for the type,
and then the encoding for the underlying value. Having the type be encoded
first ensures that the value could theoretically be decoded next.
* pointers are encoded first with a byte indicating whether it is
1) nil, 2) a cycle, or 3) newly seen. If a cycle, it is followed by
a fixed-width index for the pointer. If newly seen, it is followed by
the encoding for the pointed-at value.

Removing unnecessary details speeds up hashing:

	name              old time/op    new time/op    delta
	Hash-8              76.0µs ± 1%    55.8µs ± 2%  -26.62%        (p=0.000 n=10+10)
	HashMapAcyclic-8    61.9µs ± 0%    62.0µs ± 0%     ~             (p=0.666 n=9+9)
	TailcfgNode-8       10.2µs ± 1%     7.5µs ± 1%  -26.90%         (p=0.000 n=10+9)
	HashArray-8         1.07µs ± 1%    0.70µs ± 1%  -34.67%         (p=0.000 n=10+9)

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Joe Tsai d8fbce7eef
util/deephash: hash uint{8,16,32,64} explicitly (#2502)
Instead of hashing the humanly formatted forms of a number,
hash the native machine bits of the integers themselves.

There is a small performance gain for this:
	name              old time/op    new time/op    delta
	Hash-8              75.7µs ± 1%    76.0µs ± 2%    ~            (p=0.315 n=10+9)
	HashMapAcyclic-8    63.1µs ± 3%    61.3µs ± 1%  -2.77%        (p=0.000 n=10+10)
	TailcfgNode-8       10.3µs ± 1%    10.2µs ± 1%  -1.48%        (p=0.000 n=10+10)
	HashArray-8         1.07µs ± 1%    1.05µs ± 1%  -1.79%        (p=0.000 n=10+10)

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Joe Tsai 01d4dd331d
util/deephash: simplify hasher.hashMap (#2503)
The swapping of bufio.Writer between hasher and mapHasher is subtle.
Just embed a hasher in mapHasher to avoid complexity here.

No notable change in performance:
	name              old time/op    new time/op    delta
	Hash-8              76.7µs ± 1%    77.0µs ± 1%    ~            (p=0.182 n=9+10)
	HashMapAcyclic-8    62.4µs ± 1%    62.5µs ± 1%    ~            (p=0.315 n=10+9)
	TailcfgNode-8       10.3µs ± 1%    10.3µs ± 1%  -0.62%         (p=0.004 n=10+9)
	HashArray-8         1.07µs ± 1%    1.06µs ± 1%  -0.98%          (p=0.001 n=8+9)

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Joe Tsai d145c594ad
util/deephash: improve cycle detection (#2470)
The previous algorithm used a map of all visited pointers.
The strength of this approach is that it quickly prunes any nodes
that we have ever visited before. The detriment of the approach
is that pruning is heavily dependent on the order that pointers
were visited. This is especially relevant for hashing a map
where map entries are visited in a non-deterministic manner,
which would cause the map hash to be non-deterministic
(which defeats the point of a hash).

This new algorithm uses a stack of all visited pointers,
similar to how github.com/google/go-cmp performs cycle detection.
When we visit a pointer, we push it onto the stack, and when
we leave a pointer, we pop it from the stack.
Before visiting a pointer, we first check whether the pointer exists
anywhere in the stack. If yes, then we prune the node.
The detriment of this approach is that we may hash a node more often
than before since we do not prune as aggressively.

The set of visited pointers up until any node is only the
path of nodes up to that node and not any other pointers
that may have been visited elsewhere. This provides us
deterministic hashing regardless of visit order.
We can now delete hashMapFallback and associated complexity,
which only exists because the previous approach was non-deterministic
in the presence of cycles.

This fixes a failure of the old algorithm where obviously different
values are treated as equal because the pruning was too aggresive.
See https://github.com/tailscale/tailscale/issues/2443#issuecomment-883653534

The new algorithm is slightly slower since it prunes less aggresively:
	name              old time/op    new time/op    delta
	Hash-8              66.1µs ± 1%    68.8µs ± 1%   +4.09%        (p=0.000 n=19+19)
	HashMapAcyclic-8    63.0µs ± 1%    62.5µs ± 1%   -0.76%        (p=0.000 n=18+19)
	TailcfgNode-8       9.79µs ± 2%    9.88µs ± 1%   +0.95%        (p=0.000 n=19+17)
	HashArray-8          643ns ± 1%     653ns ± 1%   +1.64%        (p=0.000 n=19+19)
However, a slower but more correct algorithm seems
more favorable than a faster but incorrect algorithm.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Joe Tsai d666bd8533
util/deephash: disambiguate hashing of AppendTo (#2483)
Prepend size to AppendTo output.

Fixes #2443

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Joe Tsai 23ad028414
util/deephash: include type as part of hash for interfaces (#2476)
A Go interface may hold any number of different concrete types.
Just because two underlying values hash to the same thing
does not mean the two values are identical if they have different
concrete types. As such, include the type in the hash.
3 years ago
Joe Tsai a5fb8e0731
util/deephash: introduce deliberate instability (#2477)
Seed the hash upon first use with the current time.
This ensures that the stability of the hash is bounded within
the lifetime of one program execution.
Hopefully, this prevents future bugs where someone assumes that
this hash is stable.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Joe Tsai 9a0c8bdd20 util/deephash: make hash type opaque
The fact that Hash returns a [sha256.Size]byte leaks details about
the underlying hash implementation. This could very well be any other
hashing algorithm with a possible different block size.

Abstract this implementation detail away by declaring an opaque type
that is comparable. While we are changing the signature of UpdateHash,
rename it to just Update to reduce stutter (e.g., deephash.Update).

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
3 years ago
Brad Fitzpatrick ddb8726c98 util/deephash: don't reflect.Copy if element type is a defined uint8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 6dc38ff25c util/deephash: optimize hashing of byte arrays, reduce allocs in Hash
name              old time/op    new time/op    delta
Hash-6               173µs ± 4%     101µs ± 3%   -41.69%  (p=0.000 n=10+9)
HashMapAcyclic-6     101µs ± 5%     105µs ± 3%    +3.52%  (p=0.001 n=9+10)
TailcfgNode-6       29.4µs ± 2%    16.4µs ± 3%   -44.25%  (p=0.000 n=8+10)

name              old alloc/op   new alloc/op   delta
Hash-6              3.60kB ± 0%    1.13kB ± 0%   -68.70%  (p=0.000 n=10+10)
HashMapAcyclic-6    2.53kB ± 0%    2.53kB ± 0%      ~     (p=0.137 n=10+8)
TailcfgNode-6         528B ± 0%        0B       -100.00%  (p=0.000 n=10+10)

name              old allocs/op  new allocs/op  delta
Hash-6                84.0 ± 0%      40.0 ± 0%   -52.38%  (p=0.000 n=10+10)
HashMapAcyclic-6       202 ± 0%       202 ± 0%      ~     (all equal)
TailcfgNode-6         11.0 ± 0%       0.0       -100.00%  (p=0.000 n=10+10)

Updates tailscale/corp#2130

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick 3962744450 util/deephash: prevent infinite loop on map cycle
Fixes #2340

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago
Brad Fitzpatrick aceaa70b16 util/deephash: move funcs to methods
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
3 years ago