|
|
@ -3,120 +3,112 @@
|
|
|
|
[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been
|
|
|
|
[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been
|
|
|
|
recently created in response to a security issue brought up by an independent
|
|
|
|
recently created in response to a security issue brought up by an independent
|
|
|
|
party. To summarise the issue, lookups (of Matrix user IDs) are performed using
|
|
|
|
party. To summarise the issue, lookups (of Matrix user IDs) are performed using
|
|
|
|
non-hashed 3PIDs (third-party IDs) which means that the identity server can
|
|
|
|
plain-text 3PIDs (third-party IDs) which means that the identity server can
|
|
|
|
identify and record every 3PID that the user wants to check, whether that
|
|
|
|
identify and record every 3PID that the user has in their contacts, whether
|
|
|
|
address is already known by the identity server or not.
|
|
|
|
that email address or phone number is already known by the identity server or
|
|
|
|
|
|
|
|
not.
|
|
|
|
|
|
|
|
|
|
|
|
If the 3PID is hashed, the identity server could not determine the address
|
|
|
|
If the 3PID is hashed, the identity server could not determine the address
|
|
|
|
unless it has already seen that address in plain-text during a previous call of
|
|
|
|
unless it has already seen that address in plain-text during a previous call of
|
|
|
|
the /bind mechanism.
|
|
|
|
the /bind mechanism (without significant resources to reverse the hashes).
|
|
|
|
|
|
|
|
|
|
|
|
Note that in terms of privacy, this proposal does not stop an identity server
|
|
|
|
This proposal thus calls for the Identity Service API's /lookup endpoint to use
|
|
|
|
from mapping hashed 3PIDs to users, resulting in a social graph. However, the
|
|
|
|
a back-and-forth mechanism of passing partial hashed 3PIDs instead of their
|
|
|
|
identity of the 3PID will at least remain a mystery until /bind is used.
|
|
|
|
plain-text counterparts, which should leak mess less data to either party.
|
|
|
|
|
|
|
|
|
|
|
|
This proposal thus calls for the Identity Service’s /lookup API to use hashed
|
|
|
|
|
|
|
|
3PIDs instead of their plain-text counterparts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Proposal
|
|
|
|
## Proposal
|
|
|
|
|
|
|
|
|
|
|
|
This proposal suggests making changes to the Identity Service API's lookup
|
|
|
|
This proposal suggests making changes to the Identity Service API's lookup
|
|
|
|
endpoints. Due to the nature of this proposal, the new endpoints should be on a
|
|
|
|
endpoints. Instead of the `/lookup` and `/bulk_lookup` endpoints, this proposal
|
|
|
|
`v2` path (we also drop the `/api` in order to preserve consistency across
|
|
|
|
replaces them with endpoints `/lookup` and `/lookup_hashes`. Additionally, the
|
|
|
|
other endpoints):
|
|
|
|
endpoints should be on a `v2` path, to avoid confusion with the original
|
|
|
|
|
|
|
|
`/lookup`. We also drop the `/api` in order to preserve consistency across
|
|
|
|
|
|
|
|
other endpoints:
|
|
|
|
|
|
|
|
|
|
|
|
- `/_matrix/identity/v2/lookup`
|
|
|
|
- `/_matrix/identity/v2/lookup`
|
|
|
|
- `/_matrix/identity/v2/bulk_lookup`
|
|
|
|
- `/_matrix/identity/v2/lookup_hashes`
|
|
|
|
|
|
|
|
|
|
|
|
`address` MUST no longer be in a plain-text format, but rather will be a
|
|
|
|
A third endpoint is added for clients to request information about the form
|
|
|
|
peppered hash value, and the resulting digest MUST be encoded in URL-safe
|
|
|
|
the server expects hashes in.
|
|
|
|
unpadded base64 (similar to [room version 4's event
|
|
|
|
|
|
|
|
IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Identity servers must specify the hashing algorithms and a pepper that they
|
|
|
|
- `/_matrix/identity/v2/hash_details`
|
|
|
|
support, which will allow for rotation if a rainbow table is ever released
|
|
|
|
|
|
|
|
coinciding with their current hash and pepper. As such, it must be possible for
|
|
|
|
|
|
|
|
clients to be able to query what pepper the identity server requires before
|
|
|
|
|
|
|
|
sending it hashes. A new endpoint must be added:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
The following back-and-forth occurs between the client and server.
|
|
|
|
GET /_matrix/identity/v2/hash_details
|
|
|
|
|
|
|
|
```
|
|
|
|
Let's say the client wants to check the following 3PIDs:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
alice@example.com
|
|
|
|
|
|
|
|
bob@example.com
|
|
|
|
|
|
|
|
carl@example.com
|
|
|
|
|
|
|
|
+1 234 567 8910
|
|
|
|
|
|
|
|
denny@example.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The client will hash each 3PID as a concatenation of the medium and address,
|
|
|
|
|
|
|
|
separated by a space and a pepper appended to the end. Note that phone numbers
|
|
|
|
|
|
|
|
should be formatted as defined by
|
|
|
|
|
|
|
|
https://matrix.org/docs/spec/appendices#pstn-phone-numbers, before being
|
|
|
|
|
|
|
|
hashed).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
"alice@example.com" -> "email alice@example.com"
|
|
|
|
|
|
|
|
"bob@example.com" -> "email bob@example.com"
|
|
|
|
|
|
|
|
"carl@example.com" -> "email carl@example.com"
|
|
|
|
|
|
|
|
"+1 234 567 8910" -> "msisdn 12345678910"
|
|
|
|
|
|
|
|
"denny@example.com" -> "email denny@example.com"
|
|
|
|
|
|
|
|
|
|
|
|
This endpoint takes no parameters, and simply returns any supported hash
|
|
|
|
Hashes must be peppered in order to reduce both the information a client gains
|
|
|
|
algorithms and pepper as a JSON object:
|
|
|
|
during the process, and attacks the identity server can perform (namely sending
|
|
|
|
|
|
|
|
a rainbow table of hashes back in the response to `/lookup`). The resulting
|
|
|
|
|
|
|
|
digest MUST be encoded in URL-safe unpadded base64 (similar to [room version
|
|
|
|
|
|
|
|
4's event IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In order for clients to know the pepper and hashing algorithm they should use,
|
|
|
|
|
|
|
|
Identity Servers must make the information available on the `/hash_details`
|
|
|
|
|
|
|
|
endpoint:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
GET /_matrix/identity/v2/hash_details
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
{
|
|
|
|
{
|
|
|
|
"lookup_pepper": "matrixrocks",
|
|
|
|
"lookup_pepper": "matrixrocks",
|
|
|
|
"algorithms": ["sha256"],
|
|
|
|
"algorithms": ["sha256"]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The name `lookup_pepper` was chosen in order to account for pepper values being
|
|
|
|
The name `lookup_pepper` was chosen in order to account for pepper values being
|
|
|
|
returned for other endpoints in the future. The contents of `lookup_pepper`
|
|
|
|
returned for other endpoints in the future. The contents of `lookup_pepper`
|
|
|
|
MUST match the regular expression `[a-zA-Z0-9]*`.
|
|
|
|
MUST match the regular expression `[a-zA-Z0-9]*`.
|
|
|
|
|
|
|
|
|
|
|
|
Clients should request this endpoint each time before making a `/lookup` or
|
|
|
|
The client should append the pepper to the end of the 3pid string before
|
|
|
|
`/bulk_lookup` request, to handle identity servers which may rotate their
|
|
|
|
hashing.
|
|
|
|
pepper values frequently. Clients must choose one of the given hash algorithms
|
|
|
|
|
|
|
|
to encrypt the 3PID during lookup.
|
|
|
|
"email alice@example.com" -> "email alice@example.commatrixrocks"
|
|
|
|
|
|
|
|
"email bob@example.com" -> "email bob@example.commatrixrocks"
|
|
|
|
Peppers are appended to the end of the 3PID before hashing. An example of
|
|
|
|
"email carl@example.com" -> "email carl@example.commatrixrocks"
|
|
|
|
generating a hash using SHA-256 and the provided pepper is as follows:
|
|
|
|
"msisdn 12345678910" -> "msisdn 12345678910matrixrocks"
|
|
|
|
|
|
|
|
"email denny@example.com" -> "email denny@example.commatrixrocks"
|
|
|
|
```python
|
|
|
|
|
|
|
|
address = "user@example.org"
|
|
|
|
Clients SHOULD request this endpoint each time before performing a lookup, to
|
|
|
|
pepper = "matrixrocks"
|
|
|
|
handle identity servers which may rotate their pepper values frequently.
|
|
|
|
digest = hashlib.sha256((address + pepper).encode()).digest()
|
|
|
|
Clients MUST choose one of the given hash algorithms to encrypt the 3PID during
|
|
|
|
result_address = unpaddedbase64.encode_base64(digest)
|
|
|
|
lookup.
|
|
|
|
print(result_address)
|
|
|
|
|
|
|
|
vNjEQuRCOmBp/KTuIpZ7RUJgPAbVAyqa0Uzh770tQaw
|
|
|
|
Note that possible hashing algorithms will be defined in the Matrix
|
|
|
|
```
|
|
|
|
specification, and an Identity Server can choose to implement one or all of
|
|
|
|
|
|
|
|
them. Later versions of the specification may deprecate algorithms when
|
|
|
|
Possible hashing algorithms will be defined in the Matrix specification, and an
|
|
|
|
necessary. Currently the only listed hashing algorithm is SHA-256 as defined by
|
|
|
|
Identity Server can choose to implement one or all of them. Later versions of
|
|
|
|
[RFC 4634](https://tools.ietf.org/html/rfc4634) and Identity Servers and
|
|
|
|
the specification may deprecate algorithms when necessary. Currently the only
|
|
|
|
clients MUST agree to its use with the string `sha256`. SHA-256 was chosen as
|
|
|
|
listed hashing algorithm is SHA-256 as defined by [RFC
|
|
|
|
it is currently used throughout the Matrix spec, as well as its properties of
|
|
|
|
4634](https://tools.ietf.org/html/rfc4634) and Identity Servers and clients
|
|
|
|
being quick to hash. While this reduces the resources necessary to generate a
|
|
|
|
MUST agree to its use with the string `sha256`. SHA-256 was chosen as it is
|
|
|
|
rainbow table for attackers, a fast hash is necessary if particularly slow
|
|
|
|
currently used throughout the Matrix spec, as well as its properties of being
|
|
|
|
mobile clients are going to be hashing thousands of contact details.
|
|
|
|
quick to hash. While this reduces the resources necessary to generate a rainbow
|
|
|
|
|
|
|
|
table for attackers, a fast hash is necessary if particularly slow mobile
|
|
|
|
|
|
|
|
clients are going to be hashing thousands of contacts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When performing a lookup, the pepper and hashing algorithm the client used must
|
|
|
|
When performing a lookup, the pepper and hashing algorithm the client used must
|
|
|
|
be part of the request body. If they do not match what the server has on file
|
|
|
|
be part of the request body. If they do not match what the server has on file
|
|
|
|
(which may be the case if the pepper was rotated right after the client's
|
|
|
|
(which may be the case if the pepper was changed right after the client's
|
|
|
|
request for it), then the server must inform the client that they need to query
|
|
|
|
request for it), then the server must inform the client that they need to query
|
|
|
|
the hash details again, instead of just returning an empty response, which
|
|
|
|
the hash details again, instead of just returning an empty response, which
|
|
|
|
clients would assume to mean that no contacts are registered on that identity
|
|
|
|
clients would assume to mean that no contacts are registered on that identity
|
|
|
|
server.
|
|
|
|
server.
|
|
|
|
|
|
|
|
|
|
|
|
Thus, an example client request to `/bulk_lookup` would look like the
|
|
|
|
|
|
|
|
following:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"threepids": [
|
|
|
|
|
|
|
|
[
|
|
|
|
|
|
|
|
"email",
|
|
|
|
|
|
|
|
"vNjEQuRCOmBp/KTuIpZ7RUJgPAbVAyqa0Uzh770tQaw"
|
|
|
|
|
|
|
|
],
|
|
|
|
|
|
|
|
[
|
|
|
|
|
|
|
|
"msisdn",
|
|
|
|
|
|
|
|
"0VnvYk7YZpe08fP/CGqs3f39QtRjqAA2lPd14eLZXiw"
|
|
|
|
|
|
|
|
],
|
|
|
|
|
|
|
|
[
|
|
|
|
|
|
|
|
"email",
|
|
|
|
|
|
|
|
"BJaLI0RrLFDMbsk0eEp5BMsYDYzvOzDneQP/9NTemYA"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
],
|
|
|
|
|
|
|
|
"lookup_pepper": "matrixrocks",
|
|
|
|
|
|
|
|
"algorithm": "sha256"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If the algorithm does not match the server's, the server should return a `400
|
|
|
|
If the algorithm does not match the server's, the server should return a `400
|
|
|
|
M_INVALID_PARAM`. If the pepper does not match the server's, the server should
|
|
|
|
M_INVALID_PARAM`. If the pepper does not match the server's, the server should
|
|
|
|
return a new error code, 400 `M_INVALID_PEPPER`. A new error code is not
|
|
|
|
return a new error code, 400 `M_INVALID_PEPPER`. A new error code is not
|
|
|
@ -127,14 +119,252 @@ Each of these error responses should contain the correct `algorithm` and
|
|
|
|
`/hash_details` again, thus saving a round-trip. An example response to an
|
|
|
|
`/hash_details` again, thus saving a round-trip. An example response to an
|
|
|
|
incorrect pepper would be:
|
|
|
|
incorrect pepper would be:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
{
|
|
|
|
{
|
|
|
|
"error": "Incorrect value for lookup_pepper",
|
|
|
|
"error": "Incorrect value for lookup_pepper",
|
|
|
|
"errcode": "M_INVALID_PEPPER",
|
|
|
|
"errcode": "M_INVALID_PEPPER",
|
|
|
|
"algorithm": "sha256",
|
|
|
|
"algorithm": "sha256",
|
|
|
|
"lookup_pepper": "matrixrocks"
|
|
|
|
"lookup_pepper": "matrixrocks"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Now comes time for the lookup. Once hashing has been performed using the
|
|
|
|
|
|
|
|
defined hashing algorithm, the client sends the first `k` characters of each
|
|
|
|
|
|
|
|
hash in an array, deduplicating any matching entries.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
`k` is a value chosen by the client. It is a tradeoff between leaking the
|
|
|
|
|
|
|
|
hashes of 3PIDs that the Identity Server doesn't know about, and the amount of
|
|
|
|
|
|
|
|
hashing the server must perform. In addition to k, the client can also set a
|
|
|
|
|
|
|
|
`max_k` that it is comfortable with. The recommended values are `k = 4` and
|
|
|
|
|
|
|
|
`max_k = 6` (see below for the reasoning behind this). Let's say the client
|
|
|
|
|
|
|
|
chooses these values.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
NOTE: Example numbers, not real hash values.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
"email alice@example.commatrixrocks" -> "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4"
|
|
|
|
|
|
|
|
"email bob@example.commatrixrocks" -> "21375b56a47c2cdc41a0596549a16ec51b64d26eb47b8e915d45b18ed17b72ff"
|
|
|
|
|
|
|
|
"email carl@example.commatrixrocks" -> "758afda64cb6a86ee6d540fa7c8b803a2479863e369cbafd71ffd376beef5d5f"
|
|
|
|
|
|
|
|
"msisdn 12345678910matrixrocks" -> "21375b3f1b61c975b13c8cecd6481a82e239e6aad644c29dc815836188ae8351"
|
|
|
|
|
|
|
|
"email denny@example.commatrixrocks" -> "70b1b5637937ab9846a94a8015e12313643a2f5323ca8f5b4ed6982fc8c3619b"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note that pairs (bob@example.com, 12345678910) and (alice@example.com, denny@example.com)
|
|
|
|
|
|
|
|
have the same leading characters in their hashed representations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
POST /_matrix/identity/v2/lookup
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"hashes": [
|
|
|
|
|
|
|
|
"70b1",
|
|
|
|
|
|
|
|
"2137",
|
|
|
|
|
|
|
|
"758a"
|
|
|
|
|
|
|
|
],
|
|
|
|
|
|
|
|
"algorithm": "sha256",
|
|
|
|
|
|
|
|
"pepper": "matrixrocks"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The identity server, upon receiving these partial hashes, can see that the
|
|
|
|
|
|
|
|
client chose `4` as its `k` value, which is the length of the shortest hash
|
|
|
|
|
|
|
|
prefix. The identity server has a "minimum k", which is a function of the
|
|
|
|
|
|
|
|
amount of 3PID hashes it currently holds and protects it against computing too
|
|
|
|
|
|
|
|
many per lookup. Let's say the Identity Server's `min_k = 5` (again, see below
|
|
|
|
|
|
|
|
for details).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The client's `k` value (4) is less than the Identity Server's `min_k` (5), so
|
|
|
|
|
|
|
|
it will reject the lookup with the following error:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"errcode": "M_HASH_TOO_SHORT",
|
|
|
|
|
|
|
|
"error": "Sent partial hashes are too short",
|
|
|
|
|
|
|
|
"minimum_length": "5"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The client then knows it must send values of at least length 5. It's `max_k` is
|
|
|
|
|
|
|
|
6, so this is fine. The client sends the values again with `k = 5`:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
POST /_matrix/identity/v2/lookup
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"hashes": [
|
|
|
|
|
|
|
|
"70b1b",
|
|
|
|
|
|
|
|
"21375",
|
|
|
|
|
|
|
|
"758af"
|
|
|
|
|
|
|
|
],
|
|
|
|
|
|
|
|
"algorithm": "sha256",
|
|
|
|
|
|
|
|
"pepper": "matrixrocks"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The Identity Server sees the hashes are within an acceptable length (5 >= 5),
|
|
|
|
|
|
|
|
then checks which hashes it knows of that match the given leading values. It
|
|
|
|
|
|
|
|
will then return the next few characters (`n`; implementation-specific; lower
|
|
|
|
|
|
|
|
means less information leaked to clients at the result of potentially more
|
|
|
|
|
|
|
|
hashing to be done) of each that match:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The identity server found the following hashes that contain the leading
|
|
|
|
|
|
|
|
characters:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4
|
|
|
|
|
|
|
|
70b1b1b28dcfcc179a54983f46e1753c3fcdb0884d06fad741582c0180b56fc9
|
|
|
|
|
|
|
|
21375b3f1b61c975b13c8cecd6481a82e239e6aad644c29dc815836188ae8351
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
And if n = 7, the identity server will send back the following payload:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"hashes": {
|
|
|
|
|
|
|
|
"70b1b": ["5637937", "1b28dcf"],
|
|
|
|
|
|
|
|
"21375": ["b3f1b61"]
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The client can then deduce which hashes actually lead to Matrix IDs. In this
|
|
|
|
|
|
|
|
case, 70b1b5637937 are the leading characters of "alice@example.com" and
|
|
|
|
|
|
|
|
"denny@example.com", while 21375b3f1b61 are the leading characters of
|
|
|
|
|
|
|
|
"+12345678910" whereas 70b1b1b28dcf does not match any of the hashes the client
|
|
|
|
|
|
|
|
has locally, so it is ignored. "bob@example.com" and "carl@example.com" do not
|
|
|
|
|
|
|
|
seem to have Matrix IDs associated with them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Finally, the client salts and hashes 3PID hashes that it believes are
|
|
|
|
|
|
|
|
associated with Matrix IDs and sends them to the identity server on the
|
|
|
|
|
|
|
|
`/lookup_hashes` endpoint. Instead of hashing the 3PIDs again, clients should
|
|
|
|
|
|
|
|
reuse the peppered hash that was previously sent to the server. Salting is
|
|
|
|
|
|
|
|
performed to prevent an identity server generating a rainbow table to reverse
|
|
|
|
|
|
|
|
any non-Matrix 3PIDs that slipped in. Salts MUST match the regular expression
|
|
|
|
|
|
|
|
`[a-zA-Z0-9]*`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Computed previously:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
"email alice@example.commatrixrocks"
|
|
|
|
|
|
|
|
becomes
|
|
|
|
|
|
|
|
"70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The client should generate a salt. Let's say it generates "salt123". This
|
|
|
|
|
|
|
|
value is appended to the hash.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
"70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4"
|
|
|
|
|
|
|
|
becomes
|
|
|
|
|
|
|
|
"70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4salt123"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
And then hashed:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
"70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4salt123"
|
|
|
|
|
|
|
|
becomes
|
|
|
|
|
|
|
|
"1f64ed6ac9d6da86b65bcc68a39c7c4d083f77193ec7e5adc4b09617f8d0d81a"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A new salt is generated and applied to each hash **prefix** individually. Doing
|
|
|
|
|
|
|
|
so requires the identity server to only rehash the 3PIDs whose unsalted hashes
|
|
|
|
|
|
|
|
matched the earlier prefixes (in the case of 70b1b, hashes 5637937... and
|
|
|
|
|
|
|
|
1b28dcf...). This adds only a small multiplier of additional hashes needing to
|
|
|
|
|
|
|
|
be performed by the Identity Server (the median number of hashes that fit each
|
|
|
|
|
|
|
|
prefix, a function of the chosen `k` value).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
An attacker would now need to create a new rainbow table per hash prefix, per
|
|
|
|
|
|
|
|
lookup. This reduces the attack surface significantly to only very targeted
|
|
|
|
|
|
|
|
attacks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
POST /_matrix/identity/v2/lookup_hashes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"hashes": {
|
|
|
|
|
|
|
|
"70b1b": {
|
|
|
|
|
|
|
|
"1": "1f64ed6ac9d6da86b65bcc68a39c7c4d083f77193ec7e5adc4b09617f8d0d81a",
|
|
|
|
|
|
|
|
"2": "a32e1c1f3b9e118eab196b0807443871628eace587361b7a02adfb2b77b8d620"
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"21375": {
|
|
|
|
|
|
|
|
"1": "372bf27a4e7e952d1e794f78f8cdfbff1a3ab2f59c6d44e869bfdd7dd1de3948"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"salts": {
|
|
|
|
|
|
|
|
"70b1b": "salt123",
|
|
|
|
|
|
|
|
"21375": "salt234"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The server reads the prefixes and only rehashes those 3PIDs that match these
|
|
|
|
|
|
|
|
hashes (being careful to continue to enforce its `min_k` requirement), and
|
|
|
|
|
|
|
|
returns them:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"mappings": {
|
|
|
|
|
|
|
|
"70b1b": {
|
|
|
|
|
|
|
|
"2": "@alice:example.com"
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"21375": {
|
|
|
|
|
|
|
|
"1": "@fred:example.com"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The client can now display which 3PIDs link to which Matrix IDs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### How to pick k
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The `k` value is a tradeoff between the privacy of the user's contacts, and the
|
|
|
|
|
|
|
|
resource-intensiveness of lookups for the identity server. Clients would rather
|
|
|
|
|
|
|
|
have a smaller `k`, while servers a larger `k`. A larger `k` also allows the
|
|
|
|
|
|
|
|
identity server to learn more about the contacts the client has that are not
|
|
|
|
|
|
|
|
Matrix users. Ideally we'd like to balance these two, and with the value also
|
|
|
|
|
|
|
|
being a factor of how many records an identity server has, there's no way to
|
|
|
|
|
|
|
|
simply give a single `k` value that should be used from the spec.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Instead, we can have the client and identity server decide it amongst
|
|
|
|
|
|
|
|
themselves. The identity server should pick a `k` value based on how many 3PIDs
|
|
|
|
|
|
|
|
records they have, and thus how much hashes they will need to perform. An ideal
|
|
|
|
|
|
|
|
value can be calculated from the following function:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C <= N / (64 ^ k)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Where N is the number of 3PID records an identity server has, k is the number of
|
|
|
|
|
|
|
|
characters to truncate each hash to, and C is the median number of hashing rounds
|
|
|
|
|
|
|
|
an identity server will need to perform per hash (denoted complexity). 64 is the
|
|
|
|
|
|
|
|
number of possible characters per byte in a hash, as hash digests are encoded in
|
|
|
|
|
|
|
|
url-safe base64.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Identity servers should choose a complexity value they're comfortable with.
|
|
|
|
|
|
|
|
Let's say 5 (for reference, HIBP's service has set their k value for a complexity
|
|
|
|
|
|
|
|
of 478: https://blog.cloudflare.com/validating-leaked-passwords-with-k-anonymity/)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When C is set (implementation specific), k can then be solved for:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
k >= - log(C/N)
|
|
|
|
|
|
|
|
----------
|
|
|
|
|
|
|
|
- log(64)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Taking HIBP's amount of passwords as an example, 600,000,000, as N and solving for k, we get:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
k >= 4.47
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We round k to 5 for it to be a whole number.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
As this is quite a lot of records, we advise clients to start with k = 4, and go from there.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For reference, a very small identity server with only 600 records would produce a
|
|
|
|
|
|
|
|
minimum k of 0.628, or 1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
From this we can see that even low k values scale to quite a lot of records.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Clients themselves should pick a reasonable default `k`, and a maximum value
|
|
|
|
|
|
|
|
that they are comfortable extending towards if the identity server requests a
|
|
|
|
|
|
|
|
higher minimum number. If the identity server requests too high of a minimum
|
|
|
|
|
|
|
|
number, clients will need to inform the user, either with an error message, or
|
|
|
|
|
|
|
|
more advanced clients could allow users to tweak their k values.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Past what they already knew, from this exchange the client and server have learned:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Client:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Unsalted, peppered partial 3PID hash "70b1b1b28dcf"
|
|
|
|
|
|
|
|
of some matrix user
|
|
|
|
|
|
|
|
(harder to crack, and new rainbow table needed)
|
|
|
|
|
|
|
|
* alice@example.com -> @alice:example.com (required)
|
|
|
|
|
|
|
|
* +1 234 567 8910 -> @fred:example.com (required)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Server:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Partial hash "758af" (likely useless)
|
|
|
|
|
|
|
|
* The server knows some salted hash
|
|
|
|
|
|
|
|
70b1b5637937ab9846a94a8015e12313643a2f5323ca8f5b4ed6982fc8c3619bf
|
|
|
|
|
|
|
|
(crackable, new rainbow table needed)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
No parameter changes will be made to /bind.
|
|
|
|
No parameter changes will be made to /bind.
|
|
|
|
|
|
|
|
|
|
|
@ -151,10 +381,10 @@ are being sent to.
|
|
|
|
|
|
|
|
|
|
|
|
## Tradeoffs
|
|
|
|
## Tradeoffs
|
|
|
|
|
|
|
|
|
|
|
|
* This approach means that the client now needs to calculate a hash by itself,
|
|
|
|
|
|
|
|
but the belief is that most languages provide a mechanism for doing so.
|
|
|
|
|
|
|
|
* There is a small cost incurred by performing hashes before requests, but this
|
|
|
|
* There is a small cost incurred by performing hashes before requests, but this
|
|
|
|
is outweighed by the privacy implications of sending plain-text addresses.
|
|
|
|
is outweighed by the privacy implications of sending plain-text addresses.
|
|
|
|
|
|
|
|
* Identity services will need to perform a lot of hashing, however with
|
|
|
|
|
|
|
|
authentication being added in MSC 2140, effective rate-limiting is possible.
|
|
|
|
|
|
|
|
|
|
|
|
## Potential issues
|
|
|
|
## Potential issues
|
|
|
|
|
|
|
|
|
|
|
@ -186,14 +416,14 @@ for a federated network, as it requires specialized hardware.
|
|
|
|
While a bit out of scope for this MSC, there has been debate over preventing
|
|
|
|
While a bit out of scope for this MSC, there has been debate over preventing
|
|
|
|
3PIDs as being kept as plain-text on disk. The argument against this was that
|
|
|
|
3PIDs as being kept as plain-text on disk. The argument against this was that
|
|
|
|
if the hashing algorithm (in this case SHA-256) was broken, we couldn't update
|
|
|
|
if the hashing algorithm (in this case SHA-256) was broken, we couldn't update
|
|
|
|
the hashing algorithm without having the plaintext 3PIDs. @lampholder helpfully
|
|
|
|
the hashing algorithm without having the plain-text 3PIDs. @lampholder helpfully
|
|
|
|
added that we could just take the old hashes and rehash them in the more secure
|
|
|
|
added that we could just take the old hashes and rehash them in the more secure
|
|
|
|
hashing algorithm, thus transforming the hash from SHA-256 to
|
|
|
|
hashing algorithm, thus transforming the hash from SHA-256 to
|
|
|
|
SHA-256+SomeBetterAlg. However @erikjohnston then pointed out that if
|
|
|
|
SHA-256+SomeBetterAlg. However @erikjohnston then pointed out that if
|
|
|
|
`BrokenAlgo(a) == BrokenAlgo(b)` then `SuperGreatHash(BrokenAlgo(a)) ==
|
|
|
|
`BrokenAlgo(a) == BrokenAlgo(b)` then `SuperGreatHash(BrokenAlgo(a)) ==
|
|
|
|
SuperGreatHash(BrokenAlgo(b))`, so all you'd need to do is find a match in the
|
|
|
|
SuperGreatHash(BrokenAlgo(b))`, so all you'd need to do is find a match in the
|
|
|
|
broken algo, and you'd break the new algorithm as well. This means that you
|
|
|
|
broken algo, and you'd break the new algorithm as well. This means that you
|
|
|
|
would need the plaintext 3PIDs to encode a new hash, and thus storing them
|
|
|
|
would need the plain-text 3PIDs to encode a new hash, and thus storing them
|
|
|
|
hashed on disk would require a transition period where 3PIDs were reuploaded in
|
|
|
|
hashed on disk would require a transition period where 3PIDs were reuploaded in
|
|
|
|
a strong hash variant.
|
|
|
|
a strong hash variant.
|
|
|
|
|
|
|
|
|
|
|
|