Allow for changing the hashing algo and add at-rest details

pull/977/head
Andrew Morgan 5 years ago
parent f8dbf2b360
commit d2b47a585d

@ -1,57 +1,59 @@
# MSC2134: Identity Hash Lookups # MSC2134: Identity Hash Lookups
[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been recently [Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been
created in response to a security issue brought up by an independant party. To summarise recently created in response to a security issue brought up by an independent
the issue, lookups (of matrix user ids) are performed using non-hashed 3pids which means party. To summarise the issue, lookups (of matrix user ids) are performed using
that the 3pid is identifiable to anyone who can see the payload (e.g. willh@matrix.org non-hashed 3pids (third-party IDs) which means that the identity server can
can be identified). identify and record every 3pid that the user wants to check, whether that
address is already known by the identity server or not.
The problem with this, is that a malicious identity service could then store the plaintext If the 3pid is hashed, the identity service could not determine the address
3pid and make an assumption that the requesting entity knows the holder of the 3pid, even unless it has already seen that address in plain-text during a previous call of
if the identity service does not know of the 3pid beforehand. the /bind mechanism.
If the 3pid is hashed, the identity service could not determine the owner of the 3pid Note that in terms of privacy, this proposal does not stop an identity service
unless the identity service has already been made aware of the 3pid by the owner from mapping hashed 3pids to users, resulting in a social graph. However, the
themselves (using the /bind mechanism). identity of the 3pid will at least remain a mystery until /bind is used.
Note that this proposal does not stop a identity service from mapping hashed 3pids to many This proposal thus calls for the Identity Services /lookup API to use hashed
users, in an attempt to form a social graph. However the identity of the 3pid will remain 3pids instead of their plain-text counterparts.
a mystery until /bind is used.
It should be clear that there is a need to hide any address from the identity service that
has not been explicitly bound to it, and this proposal aims to solve that for the lookup API.
## Proposal ## Proposal
This proposal suggests making changes to the Identity Service API's lookup endpoints. Due This proposal suggests making changes to the Identity Service API's lookup
to the nature of this proposal, the new endpoints should be on a `v2` path: endpoints. Due to the nature of this proposal, the new endpoints should be on a
`v2` path:
- `/_matrix/identity/api/v2/lookup` - `/_matrix/identity/api/v2/lookup`
- `/_matrix/identity/api/v2/bulk_lookup` - `/_matrix/identity/api/v2/bulk_lookup`
The parameters will remain the same, but `address` should no longer be in a plain-text The parameters will remain the same, but `address` should no longer be in a
format. `address` will now take a SHA-256 format hash value, and the resulting digest should plain-text format. `address` will now take a hash value, and the resulting
be encoded in base64 format. For example: digest should be encoded in unpadded base64. For example:
```python ```python
address = "willh@matrix.org" address = "user@example.org"
digest = hashlib.sha256(address.encode()).digest() digest = hashlib.sha256(address.encode()).digest()
result_address = base64.encodebytes(digest).decode() result_address = unpaddedbase64.encode_base64(digest)
print(result_address) print(result_address)
CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w= CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w
``` ```
### Example request ### Example request
SHA-256 has been chosen as it is [currently used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) in the Matrix protocol, and the only SHA-256 has been chosen as it is [currently used
requirement for the hashing algorithm is that it cannot be used to guess the real value of the address elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events)
in the Matrix protocol. As time goes on, this algorithm may be changed provided
a spec bump is performed. Then, clients making a request to `/lookup` must use
the hashing algorithm defined in whichever version of the CS spec they and the
IS have agreed to speaking.
No parameter changes will be made to /bind, but identity services should keep a hashed value No parameter changes will be made to /bind, but identity services should keep a
for each address it knows about in order to process lookups quicker and it is the recommendation hashed value for each address it knows about in order to process lookups
that this is done at the time of bind. quicker. It is the recommendation that this is done during the act of binding.
`v1` versions of these endpoints may be disabled at the discretion of the implementation, and `v1` versions of these endpoints may be disabled at the discretion of the
should return a `M_FORBIDDEN` `errcode` if so. implementation, and should return a `M_FORBIDDEN` `errcode` if so.
## Tradeoffs ## Tradeoffs
@ -59,20 +61,27 @@ should return a `M_FORBIDDEN` `errcode` if so.
* This approach means that the client now needs to calculate a hash by itself, but the belief * This approach means that the client now needs to calculate a hash by itself, but the belief
is that most languages provide a mechanism for doing so. is that most languages provide a mechanism for doing so.
* There is a small cost incurred by doing hashes before requests, but this is outweighed by * There is a small cost incurred by doing hashes before requests, but this is outweighed by
the privacy implications of sending plaintext addresses. the privacy implications of sending plain-text addresses.
## Potential issues ## Potential issues
This proposal does not force a identity service to stop handling plaintext requests, because This proposal does not force an identity service to stop handling plain-text
a large amount of the matrix ecosystem relies upon this behavior. However, a conscious effort requests, because a large amount of the matrix ecosystem relies upon this
should be made by all users to use the privacy respecting endpoints outlined above. Identity behavior. However, a conscious effort should be made by all users to use the
services may disallow use of the v1 endpoint. privacy respecting endpoints outlined above. Identity services may disallow use
of the v1 endpoint.
Base64 has been chosen to encode the value due to it's ubiquitous support in many languages, Unpadded base64 has been chosen to encode the value due to its ubiquitous
however it does mean that special characters in the address will have to be encoded when used support in many languages, however it does mean that special characters in the
as a parameter value. address will have to be encoded when used as a parameter value.
## Other considered solutions
Ideally identity servers would never receive plain-text addresses, however it
is necessary for the identity server to send an email/sms message during a
bind, as it cannot trust a homeserver to do so as the homeserver may be lying.
Additionally, only storing 3pid hashes at rest instead of the plain-text
versions is impractical if the hashing algorithm ever needs to be changed.
## Security considerations ## Security considerations
@ -80,6 +89,8 @@ None
## Conclusion ## Conclusion
This proposal outlines a quick and effective method to stop bulk collection of user's contact This proposal outlines an effective method to stop bulk collection of user's
lists and their social graphs without any disasterous side effects. All functionality which contact lists and their social graphs without any disastrous side effects. All
depends on the lookup service should continue to function unhindered by the use of hashes. functionality which depends on the lookup service should continue to function
unhindered by the use of hashes.

Loading…
Cancel
Save