From 3eff76b00aabfc34cc267dafe857cedc7bf564f6 Mon Sep 17 00:00:00 2001 From: Will Hunt Date: Sat, 15 Jun 2019 12:36:57 +0100 Subject: [PATCH 01/67] MSC 2134 --- proposals/2134-identity-hash-lookup.md | 56 ++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 proposals/2134-identity-hash-lookup.md diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md new file mode 100644 index 00000000..af732cfe --- /dev/null +++ b/proposals/2134-identity-hash-lookup.md @@ -0,0 +1,56 @@ +# MSC 2134: Identity Hash Lookups + +[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been recently created in response to a security issue brought up by an independant party. To summarise the issue, lookups (of matrix userids) are performed using non-hashed 3pids which means that the 3pid is identifiable to anyone who can see the payload (e.g. willh AT matrix.org can be identified by a human). + +The problem with this, is that a malicious identity service could then store the plaintext 3pid and make an assumption that the requesting entity knows the holder of the 3pid, even if the identity service does not know of the 3pid beforehand. + +If the 3pid is hashed, the identity service could not determine the owner of the 3pid unless the identity service has already been made aware of the 3pid by the owner themselves (using the /bind mechanism). + +Note that this proposal does not stop a identity service from mapping hashed 3pids to many users, in an attempt to form a social graph. However the identity of the 3pid will remain a mystery until /bind is used. + +It should be clear that there is a need to hide any address from the identity service that has not been explicitly bound to it, and this proposal aims to solve that for the lookup API. + + +## Proposal + +This proposal suggests making changes to the Identity Service API's lookup endpoints. Due to the nature of this proposal, the new endpoints should be +on a `v2` path: + +- `/_matrix/identity/api/v2/lookup` +- `/_matrix/identity/api/v2/bulk_lookup` + +The parameters will remain the same, but `address` should no longer be in a plain-text format. Medium will now take a SHA-256 format hash value, and the resulting digest should be encoded in base64 format. For example: + +```python +address = "willh@matrix.org" +digest = hashlib.sha256(address.encode()).digest() +result_address = base64.encodebytes(digest).decode() +print(result_address) +CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w= +``` + +SHA-256 has been chosen as it is [currently used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) in the Matrix protocol, and the only requirement for the hashing algorithm is that it cannot be used to guess the real value of the address + +No parameter changes will be made to /bind, but identity services should keep a hashed value for each address it knows about in order to process lookups quicker and it is the recommendation that this is done at the time of bind. + +`v1` versions of these endpoints may be disabled at the discretion of the implementation, and should return a `M_FORBIDDEN` `errcode` if so. + + +## Tradeoffs + +* This approach means that the client now needs to calculate a hash by itself, but the belief is that most librarys provide a mechanism for doing so. +* There is a small cost incurred by doing hashes before requests, but this is outweighed by the privacy implications of sending plaintext addresses. + + +## Potential issues + +This proposal does not force a identity service to stop handling plaintext requests, because a large amount of the matrix ecosystem relies upon this behavior. However, a conscious effort should be made by all users to use the privacy respecting endpoints outlined above. Identity services may disallow use of the v1 endpoint. + + +## Security considerations + +None + +## Conclusion + +This proposal outlines a quick and effective method to stop bulk collection of users contact lists and their social graphs without any disasterous side effects. All functionality which depends on the lookup service should continue to function unhindered by the use of hashes. \ No newline at end of file From a8c26d208b8ceae31b2d9d55f5a2c75c3a6ba4d4 Mon Sep 17 00:00:00 2001 From: Will Hunt Date: Sat, 15 Jun 2019 12:43:20 +0100 Subject: [PATCH 02/67] Wrap --- proposals/2134-identity-hash-lookup.md | 56 ++++++++++++++++++-------- 1 file changed, 40 insertions(+), 16 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index af732cfe..1bc48e9a 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -1,25 +1,38 @@ -# MSC 2134: Identity Hash Lookups +# MSC2134: Identity Hash Lookups -[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been recently created in response to a security issue brought up by an independant party. To summarise the issue, lookups (of matrix userids) are performed using non-hashed 3pids which means that the 3pid is identifiable to anyone who can see the payload (e.g. willh AT matrix.org can be identified by a human). +[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been recently +created in response to a security issue brought up by an independant party. To summarise +the issue, lookups (of matrix userids) are performed using non-hashed 3pids which means +that the 3pid is identifiable to anyone who can see the payload (e.g. willh AT matrix.org +can be identified by a human). -The problem with this, is that a malicious identity service could then store the plaintext 3pid and make an assumption that the requesting entity knows the holder of the 3pid, even if the identity service does not know of the 3pid beforehand. +The problem with this, is that a malicious identity service could then store the plaintext +3pid and make an assumption that the requesting entity knows the holder of the 3pid, even +if the identity service does not know of the 3pid beforehand. -If the 3pid is hashed, the identity service could not determine the owner of the 3pid unless the identity service has already been made aware of the 3pid by the owner themselves (using the /bind mechanism). +If the 3pid is hashed, the identity service could not determinethe owner of the 3pid +unless the identity service has already been made aware of the 3pid by the owner +themselves (using the /bind mechanism). -Note that this proposal does not stop a identity service from mapping hashed 3pids to many users, in an attempt to form a social graph. However the identity of the 3pid will remain a mystery until /bind is used. +Note that this proposal does not stop a identity service from mapping hashed 3pids to many +users, in an attempt to form a social graph. However the identity of the 3pid will remain +a mystery until /bind is used. -It should be clear that there is a need to hide any address from the identity service that has not been explicitly bound to it, and this proposal aims to solve that for the lookup API. +It should be clear that there is a need to hide any address from the identity service that +has not been explicitly bound to it, and this proposal aims to solve that for the lookup API. ## Proposal -This proposal suggests making changes to the Identity Service API's lookup endpoints. Due to the nature of this proposal, the new endpoints should be -on a `v2` path: +This proposal suggests making changes to the Identity Service API's lookup endpoints. Due +to the nature of this proposal, the new endpoints should be on a `v2` path: - `/_matrix/identity/api/v2/lookup` - `/_matrix/identity/api/v2/bulk_lookup` -The parameters will remain the same, but `address` should no longer be in a plain-text format. Medium will now take a SHA-256 format hash value, and the resulting digest should be encoded in base64 format. For example: +The parameters will remain the same, but `address` should no longer be in a plain-text +format. Medium will now take a SHA-256 format hash value, and the resulting digest should +be encoded in base64 format. For example: ```python address = "willh@matrix.org" @@ -29,22 +42,31 @@ print(result_address) CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w= ``` -SHA-256 has been chosen as it is [currently used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) in the Matrix protocol, and the only requirement for the hashing algorithm is that it cannot be used to guess the real value of the address +SHA-256 has been chosen as it is [currently used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) in the Matrix protocol, and the only +requirement for the hashing algorithm is that it cannot be used to guess the real value of the address -No parameter changes will be made to /bind, but identity services should keep a hashed value for each address it knows about in order to process lookups quicker and it is the recommendation that this is done at the time of bind. +No parameter changes will be made to /bind, but identity services should keep a hashed value +for each address it knows about in order to process lookups quicker and it is the recommendation +that this is done at the time of bind. -`v1` versions of these endpoints may be disabled at the discretion of the implementation, and should return a `M_FORBIDDEN` `errcode` if so. +`v1` versions of these endpoints may be disabled at the discretion of the implementation, and +should return a `M_FORBIDDEN` `errcode` if so. ## Tradeoffs -* This approach means that the client now needs to calculate a hash by itself, but the belief is that most librarys provide a mechanism for doing so. -* There is a small cost incurred by doing hashes before requests, but this is outweighed by the privacy implications of sending plaintext addresses. +* This approach means that the client now needs to calculate a hash by itself, but the belief + is that most librarys provide a mechanism for doing so. +* There is a small cost incurred by doing hashes before requests, but this is outweighed by + the privacy implications of sending plaintext addresses. ## Potential issues -This proposal does not force a identity service to stop handling plaintext requests, because a large amount of the matrix ecosystem relies upon this behavior. However, a conscious effort should be made by all users to use the privacy respecting endpoints outlined above. Identity services may disallow use of the v1 endpoint. +This proposal does not force a identity service to stop handling plaintext requests, because +a large amount of the matrix ecosystem relies upon this behavior. However, a conscious effort +should be made by all users to use the privacy respecting endpoints outlined above. Identity +services may disallow use of the v1 endpoint. ## Security considerations @@ -53,4 +75,6 @@ None ## Conclusion -This proposal outlines a quick and effective method to stop bulk collection of users contact lists and their social graphs without any disasterous side effects. All functionality which depends on the lookup service should continue to function unhindered by the use of hashes. \ No newline at end of file +This proposal outlines a quick and effective method to stop bulk collection of users contact +lists and their social graphs without any disasterous side effects. All functionality which +depends on the lookup service should continue to function unhindered by the use of hashes. \ No newline at end of file From 8b92df74abfd6c045bf348a05b67281328cdcb6a Mon Sep 17 00:00:00 2001 From: Will Hunt Date: Sat, 15 Jun 2019 13:25:42 +0100 Subject: [PATCH 03/67] s/medium/address --- proposals/2134-identity-hash-lookup.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 1bc48e9a..29144ec6 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -31,7 +31,7 @@ to the nature of this proposal, the new endpoints should be on a `v2` path: - `/_matrix/identity/api/v2/bulk_lookup` The parameters will remain the same, but `address` should no longer be in a plain-text -format. Medium will now take a SHA-256 format hash value, and the resulting digest should +format. `address` will now take a SHA-256 format hash value, and the resulting digest should be encoded in base64 format. For example: ```python @@ -42,6 +42,8 @@ print(result_address) CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w= ``` +### Example request + SHA-256 has been chosen as it is [currently used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) in the Matrix protocol, and the only requirement for the hashing algorithm is that it cannot be used to guess the real value of the address From 12431f1a4e4f5ef0a7d61c1bcf3f1989d227c7f5 Mon Sep 17 00:00:00 2001 From: Will Hunt Date: Sat, 15 Jun 2019 13:29:59 +0100 Subject: [PATCH 04/67] Base64 potential issue --- proposals/2134-identity-hash-lookup.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 29144ec6..54e8bcbf 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -2,15 +2,15 @@ [Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been recently created in response to a security issue brought up by an independant party. To summarise -the issue, lookups (of matrix userids) are performed using non-hashed 3pids which means -that the 3pid is identifiable to anyone who can see the payload (e.g. willh AT matrix.org -can be identified by a human). +the issue, lookups (of matrix user ids) are performed using non-hashed 3pids which means +that the 3pid is identifiable to anyone who can see the payload (e.g. willh@matrix.org +can be identified). The problem with this, is that a malicious identity service could then store the plaintext 3pid and make an assumption that the requesting entity knows the holder of the 3pid, even if the identity service does not know of the 3pid beforehand. -If the 3pid is hashed, the identity service could not determinethe owner of the 3pid +If the 3pid is hashed, the identity service could not determine the owner of the 3pid unless the identity service has already been made aware of the 3pid by the owner themselves (using the /bind mechanism). @@ -21,7 +21,6 @@ a mystery until /bind is used. It should be clear that there is a need to hide any address from the identity service that has not been explicitly bound to it, and this proposal aims to solve that for the lookup API. - ## Proposal This proposal suggests making changes to the Identity Service API's lookup endpoints. Due @@ -58,7 +57,7 @@ should return a `M_FORBIDDEN` `errcode` if so. ## Tradeoffs * This approach means that the client now needs to calculate a hash by itself, but the belief - is that most librarys provide a mechanism for doing so. + is that most languages provide a mechanism for doing so. * There is a small cost incurred by doing hashes before requests, but this is outweighed by the privacy implications of sending plaintext addresses. @@ -70,6 +69,10 @@ a large amount of the matrix ecosystem relies upon this behavior. However, a con should be made by all users to use the privacy respecting endpoints outlined above. Identity services may disallow use of the v1 endpoint. +Base64 has been chosen to encode the value due to it's ubiquitous support in many languages, +however it does mean that special characters in the address will have to be encoded when used +as a parameter value. + ## Security considerations From f8dbf2b360b8b2f220b9c43c450b2845db10a1e1 Mon Sep 17 00:00:00 2001 From: Will Hunt Date: Mon, 17 Jun 2019 13:17:57 +0100 Subject: [PATCH 05/67] Update proposals/2134-identity-hash-lookup.md Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> --- proposals/2134-identity-hash-lookup.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 54e8bcbf..ad00c6fc 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -80,6 +80,6 @@ None ## Conclusion -This proposal outlines a quick and effective method to stop bulk collection of users contact +This proposal outlines a quick and effective method to stop bulk collection of user's contact lists and their social graphs without any disasterous side effects. All functionality which -depends on the lookup service should continue to function unhindered by the use of hashes. \ No newline at end of file +depends on the lookup service should continue to function unhindered by the use of hashes. From d2b47a585d5fbdba44b8d461f30a8e72a1e50e25 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 18 Jun 2019 16:37:02 +0100 Subject: [PATCH 06/67] Allow for changing the hashing algo and add at-rest details --- proposals/2134-identity-hash-lookup.md | 99 ++++++++++++++------------ 1 file changed, 55 insertions(+), 44 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index ad00c6fc..9b448d68 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -1,57 +1,59 @@ # MSC2134: Identity Hash Lookups -[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been recently -created in response to a security issue brought up by an independant party. To summarise -the issue, lookups (of matrix user ids) are performed using non-hashed 3pids which means -that the 3pid is identifiable to anyone who can see the payload (e.g. willh@matrix.org -can be identified). +[Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been +recently created in response to a security issue brought up by an independent +party. To summarise the issue, lookups (of matrix user ids) are performed using +non-hashed 3pids (third-party IDs) which means that the identity server can +identify and record every 3pid that the user wants to check, whether that +address is already known by the identity server or not. -The problem with this, is that a malicious identity service could then store the plaintext -3pid and make an assumption that the requesting entity knows the holder of the 3pid, even -if the identity service does not know of the 3pid beforehand. +If the 3pid is hashed, the identity service could not determine the address +unless it has already seen that address in plain-text during a previous call of +the /bind mechanism. -If the 3pid is hashed, the identity service could not determine the owner of the 3pid -unless the identity service has already been made aware of the 3pid by the owner -themselves (using the /bind mechanism). +Note that in terms of privacy, this proposal does not stop an identity service +from mapping hashed 3pids to users, resulting in a social graph. However, the +identity of the 3pid will at least remain a mystery until /bind is used. -Note that this proposal does not stop a identity service from mapping hashed 3pids to many -users, in an attempt to form a social graph. However the identity of the 3pid will remain -a mystery until /bind is used. - -It should be clear that there is a need to hide any address from the identity service that -has not been explicitly bound to it, and this proposal aims to solve that for the lookup API. +This proposal thus calls for the Identity Service’s /lookup API to use hashed +3pids instead of their plain-text counterparts. ## Proposal -This proposal suggests making changes to the Identity Service API's lookup endpoints. Due -to the nature of this proposal, the new endpoints should be on a `v2` path: +This proposal suggests making changes to the Identity Service API's lookup +endpoints. Due to the nature of this proposal, the new endpoints should be on a +`v2` path: - `/_matrix/identity/api/v2/lookup` - `/_matrix/identity/api/v2/bulk_lookup` -The parameters will remain the same, but `address` should no longer be in a plain-text -format. `address` will now take a SHA-256 format hash value, and the resulting digest should -be encoded in base64 format. For example: +The parameters will remain the same, but `address` should no longer be in a +plain-text format. `address` will now take a hash value, and the resulting +digest should be encoded in unpadded base64. For example: ```python -address = "willh@matrix.org" +address = "user@example.org" digest = hashlib.sha256(address.encode()).digest() -result_address = base64.encodebytes(digest).decode() +result_address = unpaddedbase64.encode_base64(digest) print(result_address) -CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w= +CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w ``` ### Example request -SHA-256 has been chosen as it is [currently used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) in the Matrix protocol, and the only -requirement for the hashing algorithm is that it cannot be used to guess the real value of the address +SHA-256 has been chosen as it is [currently used +elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) +in the Matrix protocol. As time goes on, this algorithm may be changed provided +a spec bump is performed. Then, clients making a request to `/lookup` must use +the hashing algorithm defined in whichever version of the CS spec they and the +IS have agreed to speaking. -No parameter changes will be made to /bind, but identity services should keep a hashed value -for each address it knows about in order to process lookups quicker and it is the recommendation -that this is done at the time of bind. +No parameter changes will be made to /bind, but identity services should keep a +hashed value for each address it knows about in order to process lookups +quicker. It is the recommendation that this is done during the act of binding. -`v1` versions of these endpoints may be disabled at the discretion of the implementation, and -should return a `M_FORBIDDEN` `errcode` if so. +`v1` versions of these endpoints may be disabled at the discretion of the +implementation, and should return a `M_FORBIDDEN` `errcode` if so. ## Tradeoffs @@ -59,20 +61,27 @@ should return a `M_FORBIDDEN` `errcode` if so. * This approach means that the client now needs to calculate a hash by itself, but the belief is that most languages provide a mechanism for doing so. * There is a small cost incurred by doing hashes before requests, but this is outweighed by - the privacy implications of sending plaintext addresses. - + the privacy implications of sending plain-text addresses. ## Potential issues -This proposal does not force a identity service to stop handling plaintext requests, because -a large amount of the matrix ecosystem relies upon this behavior. However, a conscious effort -should be made by all users to use the privacy respecting endpoints outlined above. Identity -services may disallow use of the v1 endpoint. +This proposal does not force an identity service to stop handling plain-text +requests, because a large amount of the matrix ecosystem relies upon this +behavior. However, a conscious effort should be made by all users to use the +privacy respecting endpoints outlined above. Identity services may disallow use +of the v1 endpoint. -Base64 has been chosen to encode the value due to it's ubiquitous support in many languages, -however it does mean that special characters in the address will have to be encoded when used -as a parameter value. +Unpadded base64 has been chosen to encode the value due to its ubiquitous +support in many languages, however it does mean that special characters in the +address will have to be encoded when used as a parameter value. +## Other considered solutions + +Ideally identity servers would never receive plain-text addresses, however it +is necessary for the identity server to send an email/sms message during a +bind, as it cannot trust a homeserver to do so as the homeserver may be lying. +Additionally, only storing 3pid hashes at rest instead of the plain-text +versions is impractical if the hashing algorithm ever needs to be changed. ## Security considerations @@ -80,6 +89,8 @@ None ## Conclusion -This proposal outlines a quick and effective method to stop bulk collection of user's contact -lists and their social graphs without any disasterous side effects. All functionality which -depends on the lookup service should continue to function unhindered by the use of hashes. +This proposal outlines an effective method to stop bulk collection of user's +contact lists and their social graphs without any disastrous side effects. All +functionality which depends on the lookup service should continue to function +unhindered by the use of hashes. + From 063b9f60e0441f252df7cdf00fc5b5ee1774b99a Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 18 Jun 2019 16:50:47 +0100 Subject: [PATCH 07/67] Require a salt to defend against rainbow tables --- proposals/2134-identity-hash-lookup.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 9b448d68..bc25b92e 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -43,10 +43,12 @@ CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w SHA-256 has been chosen as it is [currently used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) -in the Matrix protocol. As time goes on, this algorithm may be changed provided -a spec bump is performed. Then, clients making a request to `/lookup` must use -the hashing algorithm defined in whichever version of the CS spec they and the -IS have agreed to speaking. +in the Matrix protocol. Additionally a hardcoded salt (“matrix” or something) +must be prepended to the data before hashing in order to serve as a weak +defense against existing rainbow tables. As time goes on, this algorithm may be +changed provided a spec bump is performed. Then, clients making a request to +`/lookup` must use the hashing algorithm defined in whichever version of the CS +spec they and the IS have agreed to speaking. No parameter changes will be made to /bind, but identity services should keep a hashed value for each address it knows about in order to process lookups From bc9b6c3659e861779367de35234666523b319d2a Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 18 Jun 2019 17:03:49 +0100 Subject: [PATCH 08/67] Add salt to example and signal link --- proposals/2134-identity-hash-lookup.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index bc25b92e..a34ee767 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -33,7 +33,8 @@ digest should be encoded in unpadded base64. For example: ```python address = "user@example.org" -digest = hashlib.sha256(address.encode()).digest() +salt = "matrix" +digest = hashlib.sha256((salt + address).encode()).digest() result_address = unpaddedbase64.encode_base64(digest) print(result_address) CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w @@ -85,6 +86,8 @@ bind, as it cannot trust a homeserver to do so as the homeserver may be lying. Additionally, only storing 3pid hashes at rest instead of the plain-text versions is impractical if the hashing algorithm ever needs to be changed. +Bloom filters are an alternative method of providing private contact discovery, however does not scale well due to clients needing to download a large filter that needs updating every time a new bind is made. Further considered solutions are explored in https://signal.org/blog/contact-discovery/ Signal's eventual solution of using SGX is considered impractical for a Matrix-style setup. + ## Security considerations None From 5049e552e7da9ff65ddbaa7610072baa5dfa0827 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 18 Jun 2019 17:05:46 +0100 Subject: [PATCH 09/67] Drop /api from the new endpoint --- proposals/2134-identity-hash-lookup.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index a34ee767..45f7d5f0 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -22,10 +22,11 @@ This proposal thus calls for the Identity Service’s /lookup API to use hashed This proposal suggests making changes to the Identity Service API's lookup endpoints. Due to the nature of this proposal, the new endpoints should be on a -`v2` path: +`v2` path (we also drop the `/api` in order to preserve consistency across +other endpoints): -- `/_matrix/identity/api/v2/lookup` -- `/_matrix/identity/api/v2/bulk_lookup` +- `/_matrix/identity/v2/lookup` +- `/_matrix/identity/v2/bulk_lookup` The parameters will remain the same, but `address` should no longer be in a plain-text format. `address` will now take a hash value, and the resulting From 6bb4a9e9110e58bd9a383b3957b909db7ca54222 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 18 Jun 2019 17:09:06 +0100 Subject: [PATCH 10/67] Add per-is salt consideration --- proposals/2134-identity-hash-lookup.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 45f7d5f0..f7c36e74 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -89,6 +89,10 @@ versions is impractical if the hashing algorithm ever needs to be changed. Bloom filters are an alternative method of providing private contact discovery, however does not scale well due to clients needing to download a large filter that needs updating every time a new bind is made. Further considered solutions are explored in https://signal.org/blog/contact-discovery/ Signal's eventual solution of using SGX is considered impractical for a Matrix-style setup. +We could let an identity server specify its own salt for the hashes, however it +would require an extra network call before uploading 3pid hashes in order for +the client to ask the server which salt it requires. + ## Security considerations None From f41ed02c9e8f1c12a307567a38a15e71295e3495 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 18 Jun 2019 17:22:28 +0100 Subject: [PATCH 11/67] remove sec concerns --- proposals/2134-identity-hash-lookup.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index f7c36e74..8451f72c 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -93,10 +93,6 @@ We could let an identity server specify its own salt for the hashes, however it would require an extra network call before uploading 3pid hashes in order for the client to ask the server which salt it requires. -## Security considerations - -None - ## Conclusion This proposal outlines an effective method to stop bulk collection of user's From 3ee27d38180a6142237b86499830c2d4b4c610a5 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Wed, 19 Jun 2019 15:14:30 +0100 Subject: [PATCH 12/67] salt->pepper. 1 pepper/is. add multi-hash idea --- proposals/2134-identity-hash-lookup.md | 98 ++++++++++++++++++++------ 1 file changed, 78 insertions(+), 20 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 8451f72c..cd5e3868 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -28,31 +28,81 @@ other endpoints): - `/_matrix/identity/v2/lookup` - `/_matrix/identity/v2/bulk_lookup` -The parameters will remain the same, but `address` should no longer be in a -plain-text format. `address` will now take a hash value, and the resulting -digest should be encoded in unpadded base64. For example: +`address` should no longer be in a plain-text format, but will now take a hash +value, and the resulting digest should be encoded in unpadded base64. For +example: ```python address = "user@example.org" -salt = "matrix" -digest = hashlib.sha256((salt + address).encode()).digest() +pepper = "matrix" +digest = hashlib.sha256((pepper + address).encode()).digest() result_address = unpaddedbase64.encode_base64(digest) print(result_address) CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w ``` -### Example request - SHA-256 has been chosen as it is [currently used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) -in the Matrix protocol. Additionally a hardcoded salt (“matrix” or something) -must be prepended to the data before hashing in order to serve as a weak -defense against existing rainbow tables. As time goes on, this algorithm may be -changed provided a spec bump is performed. Then, clients making a request to -`/lookup` must use the hashing algorithm defined in whichever version of the CS -spec they and the IS have agreed to speaking. - -No parameter changes will be made to /bind, but identity services should keep a +in the Matrix protocol. Additionally a +[pepper](https://en.wikipedia.org/wiki/Pepper_(cryptography)) must be prepended +to the data before hashing in order to serve as a weak defense against existing +rainbow tables. This pepper will be specified by the identity server in order +to prevent a single rainbow table being generated for all identity servers. As +time goes on, this algorithm may be changed provided a spec bump is performed. +Then, clients making a request to `/lookup` must use the hashing algorithm +defined in whichever version of the CS spec they and the IS have agreed to +speaking. + +Identity servers can specify their own peppers, which can be handy if a rainbow table is released for their current one. Identity servers could also set a timer for rotating this value to further impede rainbow table publishing. As such, it must be possible for clients to be able to query what pepper an identity server requires before sending it hashes. Thus a new endpoint must be added: + +``` +GET /_matrix/identity/v2/lookup_pepper +``` + +This endpoint takes no parameters, and simply returns the current pepper as a JSON object: + +``` +{ + "pepper": "matrixrocks" +} +``` + +In addition, the pepper the client used must be appended as a parameter to the +new `/lookup` and `/bulk_lookup` endpoints, ensuring that the client is using +the right one. If it does not match what the server has on file (which may be +the case is it rotated right after the client's request for it), then client +will know to query the pepper again instead of just getting a response saying +no contacts are registered on that identity server. + +Thus, a call to `/bulk_lookup` would look like the following: + +``` +{ + "threepids": [ + [ + "email", + "user@example.org" + ], + [ + "msisdn", + "123456789" + ], + [ + "email", + "user2@example.org" + ] + ], + "pepper": "matrixrocks" +} +``` + +If the pepper does not match the server's, the client should receive a `400 +M_INVALID_PARAM` with the error `Provided pepper value does not match +'$server_pepper'`. Clients should ensure they don't enter an infinite loop if +they receive this error more than once even after changing to the correct +pepper. + +No parameter changes will be made to /bind, but identity servers should keep a hashed value for each address it knows about in order to process lookups quicker. It is the recommendation that this is done during the act of binding. @@ -87,11 +137,19 @@ bind, as it cannot trust a homeserver to do so as the homeserver may be lying. Additionally, only storing 3pid hashes at rest instead of the plain-text versions is impractical if the hashing algorithm ever needs to be changed. -Bloom filters are an alternative method of providing private contact discovery, however does not scale well due to clients needing to download a large filter that needs updating every time a new bind is made. Further considered solutions are explored in https://signal.org/blog/contact-discovery/ Signal's eventual solution of using SGX is considered impractical for a Matrix-style setup. - -We could let an identity server specify its own salt for the hashes, however it -would require an extra network call before uploading 3pid hashes in order for -the client to ask the server which salt it requires. +Bloom filters are an alternative method of providing private contact discovery, +however does not scale well due to clients needing to download a large filter +that needs updating every time a new bind is made. Further considered solutions +are explored in https://signal.org/blog/contact-discovery/ Signal's eventual +solution of using SGX is considered impractical for a Matrix-style setup. + +Bit out of scope for this MSC, but there was an argument for not keeping all +IDs as hashed on disk in the identity server, that being if a hashing algorithm +was broken, we couldn't update the hashing algorithm without having the +plaintext 3PIDs. Well @toml helpfully said that we could just take the old +hashes and rehash them in the more secure hashing algorithm, thus transforming +the algo from ex. SHA256 to SHA256+SomeBetterAlg. This may spur an MSC in the +future that supports this, unless it is just an implementation detail. ## Conclusion From f28476f0f3887535fe8c21603933c8831db9d203 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Wed, 19 Jun 2019 16:29:24 +0100 Subject: [PATCH 13/67] line wrap and fix wording --- proposals/2134-identity-hash-lookup.md | 28 ++++++++++++++++---------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index cd5e3868..8ccfcfce 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -53,7 +53,12 @@ Then, clients making a request to `/lookup` must use the hashing algorithm defined in whichever version of the CS spec they and the IS have agreed to speaking. -Identity servers can specify their own peppers, which can be handy if a rainbow table is released for their current one. Identity servers could also set a timer for rotating this value to further impede rainbow table publishing. As such, it must be possible for clients to be able to query what pepper an identity server requires before sending it hashes. Thus a new endpoint must be added: +Identity servers can specify their own peppers, which can be handy if a rainbow +table is released for their current one. Identity servers could also set a +timer for rotating this value to further impede rainbow table publishing. As +such, it must be possible for clients to be able to query what pepper an +identity server requires before sending it hashes. Thus a new endpoint must be +added: ``` GET /_matrix/identity/v2/lookup_pepper @@ -81,15 +86,15 @@ Thus, a call to `/bulk_lookup` would look like the following: "threepids": [ [ "email", - "user@example.org" + "vNjEQuRCOmBp/KTuIpZ7RUJgPAbVAyqa0Uzh770tQaw" ], [ "msisdn", - "123456789" + "0VnvYk7YZpe08fP/CGqs3f39QtRjqAA2lPd14eLZXiw" ], [ "email", - "user2@example.org" + "BJaLI0RrLFDMbsk0eEp5BMsYDYzvOzDneQP/9NTemYA" ] ], "pepper": "matrixrocks" @@ -143,13 +148,14 @@ that needs updating every time a new bind is made. Further considered solutions are explored in https://signal.org/blog/contact-discovery/ Signal's eventual solution of using SGX is considered impractical for a Matrix-style setup. -Bit out of scope for this MSC, but there was an argument for not keeping all -IDs as hashed on disk in the identity server, that being if a hashing algorithm -was broken, we couldn't update the hashing algorithm without having the -plaintext 3PIDs. Well @toml helpfully said that we could just take the old -hashes and rehash them in the more secure hashing algorithm, thus transforming -the algo from ex. SHA256 to SHA256+SomeBetterAlg. This may spur an MSC in the -future that supports this, unless it is just an implementation detail. +While a bit out of scope for this MSC, there has been debate over preventing +3pids as being kept as plain-text on disk. The argument against this was that +if the hashing algorithm (in this case SHA-256) was broken, we couldn't update +the hashing algorithm without having the plaintext 3PIDs. Well @toml helpfully +added that we could just take the old hashes and rehash them in the more secure +hashing algorithm, thus transforming the hash from SHA-256 to +SHA-256+SomeBetterAlg. This may spur on an MSC in the future that supports +this, unless it is just an implementation detail. ## Conclusion From 1343e19a6d36bc624091320182c10452da046727 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Thu, 20 Jun 2019 14:36:47 +0100 Subject: [PATCH 14/67] Specify hash algorithm and fallback considerations --- proposals/2134-identity-hash-lookup.md | 44 ++++++++++++++++++-------- 1 file changed, 30 insertions(+), 14 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 8ccfcfce..43154d7c 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -61,23 +61,28 @@ identity server requires before sending it hashes. Thus a new endpoint must be added: ``` -GET /_matrix/identity/v2/lookup_pepper +GET /_matrix/identity/v2/hash_details ``` This endpoint takes no parameters, and simply returns the current pepper as a JSON object: ``` { - "pepper": "matrixrocks" + "pepper": "matrixrocks", + "algorithm": "sha256", } ``` -In addition, the pepper the client used must be appended as a parameter to the -new `/lookup` and `/bulk_lookup` endpoints, ensuring that the client is using -the right one. If it does not match what the server has on file (which may be -the case is it rotated right after the client's request for it), then client -will know to query the pepper again instead of just getting a response saying -no contacts are registered on that identity server. +Clients should request this endpoint every time before making a +`/(bulk_)lookup`, to handle identity servers which may rotate their pepper +values frequently. + +In addition, the pepper and hashing algorithm the client used must be a request +body field for the new `/lookup` and `/bulk_lookup` endpoints, ensuring that +the client is using the right parameters. If it does not match what the server +has on file (which may be the case is it rotated right after the client's +request for it), then the client will know to query the hash details again +instead of assuming that no contacts are registered on that identity server. Thus, a call to `/bulk_lookup` would look like the following: @@ -97,22 +102,33 @@ Thus, a call to `/bulk_lookup` would look like the following: "BJaLI0RrLFDMbsk0eEp5BMsYDYzvOzDneQP/9NTemYA" ] ], - "pepper": "matrixrocks" + "pepper": "matrixrocks", + "algorithm": "sha256" } ``` If the pepper does not match the server's, the client should receive a `400 -M_INVALID_PARAM` with the error `Provided pepper value does not match -'$server_pepper'`. Clients should ensure they don't enter an infinite loop if -they receive this error more than once even after changing to the correct -pepper. +M_INVALID_PARAM` with the error `Provided pepper does not match +'$server_pepper'`. If the algorithm does not match the server's, the client +should receive a `400 M_INVALID_PARAM` with the error `Provided algorithm does +not match '$server_algorithm'`. Clients should ensure they don't enter an +infinite loop if they receive these errors more than once even after changing +to the correct pepper and hash. No parameter changes will be made to /bind, but identity servers should keep a hashed value for each address it knows about in order to process lookups quicker. It is the recommendation that this is done during the act of binding. +## Fallback considerations + `v1` versions of these endpoints may be disabled at the discretion of the -implementation, and should return a `M_FORBIDDEN` `errcode` if so. +implementation, and should return a HTTP 403 with a `M_FORBIDDEN` `errcode` if +so. + +If an identity server is too old and a HTTP 404 is received when accessing the +`v2` endpoint, they should fallback to the `v1` endpoint instead. However, +clients should be aware that plain-text 3pids are required, and should ask for +user consent accordingly. ## Tradeoffs From 1fea604ba9fc79071124181523e6800666194f3c Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Fri, 21 Jun 2019 11:32:23 +0100 Subject: [PATCH 15/67] Don't define error message --- proposals/2134-identity-hash-lookup.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 43154d7c..8c7a0aca 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -108,12 +108,7 @@ Thus, a call to `/bulk_lookup` would look like the following: ``` If the pepper does not match the server's, the client should receive a `400 -M_INVALID_PARAM` with the error `Provided pepper does not match -'$server_pepper'`. If the algorithm does not match the server's, the client -should receive a `400 M_INVALID_PARAM` with the error `Provided algorithm does -not match '$server_algorithm'`. Clients should ensure they don't enter an -infinite loop if they receive these errors more than once even after changing -to the correct pepper and hash. +M_INVALID_PARAM`. No parameter changes will be made to /bind, but identity servers should keep a hashed value for each address it knows about in order to process lookups From e3b2ad38b5630c38dafad7e18178c6f0f9f98cfd Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Fri, 21 Jun 2019 12:17:01 +0100 Subject: [PATCH 16/67] pepper -> lookup_pepper --- proposals/2134-identity-hash-lookup.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 8c7a0aca..f85df971 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -68,11 +68,14 @@ This endpoint takes no parameters, and simply returns the current pepper as a JS ``` { - "pepper": "matrixrocks", + "lookup_pepper": "matrixrocks", "algorithm": "sha256", } ``` +`lookup_pepper` was chosen in order to account for pepper values being returned +for other endpoints in the future. + Clients should request this endpoint every time before making a `/(bulk_)lookup`, to handle identity servers which may rotate their pepper values frequently. From c63edc7b97703383e0989eaea5c242124c8a9998 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Fri, 21 Jun 2019 14:12:50 +0100 Subject: [PATCH 17/67] Clean up wording around peppers and hashes --- proposals/2134-identity-hash-lookup.md | 97 +++++++++++++------------- 1 file changed, 47 insertions(+), 50 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index f85df971..686a3787 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -28,37 +28,15 @@ other endpoints): - `/_matrix/identity/v2/lookup` - `/_matrix/identity/v2/bulk_lookup` -`address` should no longer be in a plain-text format, but will now take a hash -value, and the resulting digest should be encoded in unpadded base64. For -example: +`address` MUST no longer be in a plain-text format, but rather will be a peppered hash +value, and the resulting digest MUST be encoded in unpadded base64. -```python -address = "user@example.org" -pepper = "matrix" -digest = hashlib.sha256((pepper + address).encode()).digest() -result_address = unpaddedbase64.encode_base64(digest) -print(result_address) -CpvOgBf0hFzdqZD4ASvWW0DAefErRRX5y8IegMBO98w -``` - -SHA-256 has been chosen as it is [currently used -elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) -in the Matrix protocol. Additionally a -[pepper](https://en.wikipedia.org/wiki/Pepper_(cryptography)) must be prepended -to the data before hashing in order to serve as a weak defense against existing -rainbow tables. This pepper will be specified by the identity server in order -to prevent a single rainbow table being generated for all identity servers. As -time goes on, this algorithm may be changed provided a spec bump is performed. -Then, clients making a request to `/lookup` must use the hashing algorithm -defined in whichever version of the CS spec they and the IS have agreed to -speaking. - -Identity servers can specify their own peppers, which can be handy if a rainbow -table is released for their current one. Identity servers could also set a -timer for rotating this value to further impede rainbow table publishing. As -such, it must be possible for clients to be able to query what pepper an -identity server requires before sending it hashes. Thus a new endpoint must be -added: +Identity servers must specify their own hashing algorithms (from a list of +specified values) and peppers, which will be useful if a rainbow table is +released for their current one. Identity servers could also set a timer for +rotating the pepper value to further impede rainbow table publishing. As such, +it must be possible for clients to be able to query what pepper an identity +server requires before sending it hashes. A new endpoint must be added: ``` GET /_matrix/identity/v2/hash_details @@ -73,21 +51,39 @@ This endpoint takes no parameters, and simply returns the current pepper as a JS } ``` -`lookup_pepper` was chosen in order to account for pepper values being returned -for other endpoints in the future. +The name `lookup_pepper` was chosen in order to account for pepper values being +returned for other endpoints in the future. -Clients should request this endpoint every time before making a -`/(bulk_)lookup`, to handle identity servers which may rotate their pepper -values frequently. +Clients should request this endpoint each time before making a `/lookup` or +`/(bulk_)lookup` request, to handle identity servers which may rotate their +pepper values frequently. -In addition, the pepper and hashing algorithm the client used must be a request -body field for the new `/lookup` and `/bulk_lookup` endpoints, ensuring that -the client is using the right parameters. If it does not match what the server -has on file (which may be the case is it rotated right after the client's -request for it), then the client will know to query the hash details again -instead of assuming that no contacts are registered on that identity server. +An example of generating a hash using the above hash and pepper is as follows: -Thus, a call to `/bulk_lookup` would look like the following: +```python +address = "user@example.org" +pepper = "matrixrocks" +digest = hashlib.sha256((pepper + address).encode()).digest() +result_address = unpaddedbase64.encode_base64(digest) +print(result_address) +vNjEQuRCOmBp/KTuIpZ7RUJgPAbVAyqa0Uzh770tQaw +``` + +SHA-256 should be the first specified hash function. It has been chosen as it +is [currently used +elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) +in the Matrix protocol, and is reasonably secure as of 2019. + +When performing a lookup, the pepper and hashing algorithm the client used must +be part of the request body. If they do not match what the server has on file +(which may be the case if the pepper was rotated right after the client's +request for it), then the server can inform the client that they need to query +the hash details again, instead of just returning an empty response, which +clients would assume to mean that no contacts are registered on that identity +server. + +Thus, an example client request to `/bulk_lookup` would look like the +following: ``` { @@ -110,17 +106,19 @@ Thus, a call to `/bulk_lookup` would look like the following: } ``` -If the pepper does not match the server's, the client should receive a `400 +If the pepper does not match the server's, the server should return a `400 M_INVALID_PARAM`. No parameter changes will be made to /bind, but identity servers should keep a hashed value for each address it knows about in order to process lookups quicker. It is the recommendation that this is done during the act of binding. +Be wary that these hashes will need to be changed whenever the server's pepper +is rotated. ## Fallback considerations `v1` versions of these endpoints may be disabled at the discretion of the -implementation, and should return a HTTP 403 with a `M_FORBIDDEN` `errcode` if +implementation, and should return a HTTP 400 with a `M_DEPRECATED` `errcode` if so. If an identity server is too old and a HTTP 404 is received when accessing the @@ -128,13 +126,12 @@ If an identity server is too old and a HTTP 404 is received when accessing the clients should be aware that plain-text 3pids are required, and should ask for user consent accordingly. - ## Tradeoffs -* This approach means that the client now needs to calculate a hash by itself, but the belief - is that most languages provide a mechanism for doing so. -* There is a small cost incurred by doing hashes before requests, but this is outweighed by - the privacy implications of sending plain-text addresses. +* This approach means that the client now needs to calculate a hash by itself, + but the belief is that most languages provide a mechanism for doing so. +* There is a small cost incurred by performing hashes before requests, but this + is outweighed by the privacy implications of sending plain-text addresses. ## Potential issues @@ -151,7 +148,7 @@ address will have to be encoded when used as a parameter value. ## Other considered solutions Ideally identity servers would never receive plain-text addresses, however it -is necessary for the identity server to send an email/sms message during a +is necessary for the identity server to send email/sms messages during a bind, as it cannot trust a homeserver to do so as the homeserver may be lying. Additionally, only storing 3pid hashes at rest instead of the plain-text versions is impractical if the hashing algorithm ever needs to be changed. From 2383a55720374181d726bec475aba46943fa60a3 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Fri, 21 Jun 2019 15:40:26 +0100 Subject: [PATCH 18/67] 404 for deprecated endpoint --- proposals/2134-identity-hash-lookup.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 686a3787..0f7fca27 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -118,8 +118,7 @@ is rotated. ## Fallback considerations `v1` versions of these endpoints may be disabled at the discretion of the -implementation, and should return a HTTP 400 with a `M_DEPRECATED` `errcode` if -so. +implementation, and should return a HTTP 404 if so. If an identity server is too old and a HTTP 404 is received when accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. However, From 53f025edfc6ecd126a5ce0fe42f02edcbc7d0bc6 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Fri, 21 Jun 2019 15:42:11 +0100 Subject: [PATCH 19/67] Specify optional pepper rotation period --- proposals/2134-identity-hash-lookup.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 0f7fca27..138646ff 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -34,9 +34,12 @@ value, and the resulting digest MUST be encoded in unpadded base64. Identity servers must specify their own hashing algorithms (from a list of specified values) and peppers, which will be useful if a rainbow table is released for their current one. Identity servers could also set a timer for -rotating the pepper value to further impede rainbow table publishing. As such, -it must be possible for clients to be able to query what pepper an identity -server requires before sending it hashes. A new endpoint must be added: +rotating the pepper value to further impede rainbow table publishing (the +recommended period is every 30m, which should be enough for a client to +complete the hashing of all of a user's contacts, but also be nowhere near as +long enough to create a sophisticated rainbow table). As such, it must be +possible for clients to be able to query what pepper an identity server +requires before sending it hashes. A new endpoint must be added: ``` GET /_matrix/identity/v2/hash_details From 21e93a123ede06ef47fd1391fab87efc250a7f18 Mon Sep 17 00:00:00 2001 From: Travis Ralston Date: Fri, 21 Jun 2019 11:36:16 -0600 Subject: [PATCH 20/67] Naming and capitalization --- proposals/2134-identity-hash-lookup.md | 27 +++++++++++++------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 138646ff..dd2b8cc0 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -2,21 +2,21 @@ [Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been recently created in response to a security issue brought up by an independent -party. To summarise the issue, lookups (of matrix user ids) are performed using -non-hashed 3pids (third-party IDs) which means that the identity server can -identify and record every 3pid that the user wants to check, whether that +party. To summarise the issue, lookups (of Matrix user IDs) are performed using +non-hashed 3PIDs (third-party IDs) which means that the identity server can +identify and record every 3PID that the user wants to check, whether that address is already known by the identity server or not. -If the 3pid is hashed, the identity service could not determine the address +If the 3PID is hashed, the identity server could not determine the address unless it has already seen that address in plain-text during a previous call of the /bind mechanism. Note that in terms of privacy, this proposal does not stop an identity service -from mapping hashed 3pids to users, resulting in a social graph. However, the -identity of the 3pid will at least remain a mystery until /bind is used. +from mapping hashed 3PIDs to users, resulting in a social graph. However, the +identity of the 3PID will at least remain a mystery until /bind is used. This proposal thus calls for the Identity Service’s /lookup API to use hashed -3pids instead of their plain-text counterparts. +3PIDs instead of their plain-text counterparts. ## Proposal @@ -125,7 +125,7 @@ implementation, and should return a HTTP 404 if so. If an identity server is too old and a HTTP 404 is received when accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. However, -clients should be aware that plain-text 3pids are required, and should ask for +clients should be aware that plain-text 3PIDs are required, and should ask for user consent accordingly. ## Tradeoffs @@ -137,10 +137,10 @@ user consent accordingly. ## Potential issues -This proposal does not force an identity service to stop handling plain-text -requests, because a large amount of the matrix ecosystem relies upon this +This proposal does not force an identity server to stop handling plain-text +requests, because a large amount of the Matrix ecosystem relies upon this behavior. However, a conscious effort should be made by all users to use the -privacy respecting endpoints outlined above. Identity services may disallow use +privacy respecting endpoints outlined above. Identity servers may disallow use of the v1 endpoint. Unpadded base64 has been chosen to encode the value due to its ubiquitous @@ -152,7 +152,7 @@ address will have to be encoded when used as a parameter value. Ideally identity servers would never receive plain-text addresses, however it is necessary for the identity server to send email/sms messages during a bind, as it cannot trust a homeserver to do so as the homeserver may be lying. -Additionally, only storing 3pid hashes at rest instead of the plain-text +Additionally, only storing 3PID hashes at rest instead of the plain-text versions is impractical if the hashing algorithm ever needs to be changed. Bloom filters are an alternative method of providing private contact discovery, @@ -162,7 +162,7 @@ are explored in https://signal.org/blog/contact-discovery/ Signal's eventual solution of using SGX is considered impractical for a Matrix-style setup. While a bit out of scope for this MSC, there has been debate over preventing -3pids as being kept as plain-text on disk. The argument against this was that +3PIDs as being kept as plain-text on disk. The argument against this was that if the hashing algorithm (in this case SHA-256) was broken, we couldn't update the hashing algorithm without having the plaintext 3PIDs. Well @toml helpfully added that we could just take the old hashes and rehash them in the more secure @@ -176,4 +176,3 @@ This proposal outlines an effective method to stop bulk collection of user's contact lists and their social graphs without any disastrous side effects. All functionality which depends on the lookup service should continue to function unhindered by the use of hashes. - From e3ff80291f1607c7f2cf662da1a9f9c55c0cb429 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 24 Jun 2019 11:47:00 +0100 Subject: [PATCH 21/67] http err codes and hash wording fixes --- proposals/2134-identity-hash-lookup.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 138646ff..94e534fd 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -72,8 +72,8 @@ print(result_address) vNjEQuRCOmBp/KTuIpZ7RUJgPAbVAyqa0Uzh770tQaw ``` -SHA-256 should be the first specified hash function. It has been chosen as it -is [currently used +SHA-256 MUST be supported at a minimum. It has been chosen as it is [currently +used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) in the Matrix protocol, and is reasonably secure as of 2019. @@ -123,10 +123,10 @@ is rotated. `v1` versions of these endpoints may be disabled at the discretion of the implementation, and should return a HTTP 404 if so. -If an identity server is too old and a HTTP 404 is received when accessing the -`v2` endpoint, they should fallback to the `v1` endpoint instead. However, -clients should be aware that plain-text 3pids are required, and should ask for -user consent accordingly. +If an identity server is too old and a HTTP 404, 405 or 501 is received when +accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. +However, clients should be aware that plain-text 3pids are required, and should +ask for user consent accordingly. ## Tradeoffs From 02ac0f3b339b7df4db2180a7e690431762382335 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 24 Jun 2019 11:56:04 +0100 Subject: [PATCH 22/67] Give the user control! --- proposals/2134-identity-hash-lookup.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 9aa5fe7c..a2b6a26f 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -58,7 +58,7 @@ The name `lookup_pepper` was chosen in order to account for pepper values being returned for other endpoints in the future. Clients should request this endpoint each time before making a `/lookup` or -`/(bulk_)lookup` request, to handle identity servers which may rotate their +`/bulk_lookup` request, to handle identity servers which may rotate their pepper values frequently. An example of generating a hash using the above hash and pepper is as follows: @@ -125,8 +125,9 @@ implementation, and should return a HTTP 404 if so. If an identity server is too old and a HTTP 404, 405 or 501 is received when accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. -However, clients should be aware that plain-text 3pids are required, and should -ask for user consent accordingly. +However, clients should be aware that plain-text 3pids are required, and MUST +ask for user consent to send 3pids in plain-text, and be clear about where they +are being sent to. ## Tradeoffs From ee10576d609081cca217282692b0c002e660bfab Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 24 Jun 2019 15:43:19 +0100 Subject: [PATCH 23/67] Update with feedback --- proposals/2134-identity-hash-lookup.md | 27 ++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index a2b6a26f..2240b640 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -45,7 +45,8 @@ requires before sending it hashes. A new endpoint must be added: GET /_matrix/identity/v2/hash_details ``` -This endpoint takes no parameters, and simply returns the current pepper as a JSON object: +This endpoint takes no parameters, and simply returns supported hash algorithms +and pepper as a JSON object: ``` { @@ -72,8 +73,8 @@ print(result_address) vNjEQuRCOmBp/KTuIpZ7RUJgPAbVAyqa0Uzh770tQaw ``` -SHA-256 MUST be supported at a minimum. It has been chosen as it is [currently -used +SHA-256 MUST be supported by both servers and clients at a minimum. It has been +chosen as it is [currently used elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) in the Matrix protocol, and is reasonably secure as of 2019. @@ -104,7 +105,7 @@ following: "BJaLI0RrLFDMbsk0eEp5BMsYDYzvOzDneQP/9NTemYA" ] ], - "pepper": "matrixrocks", + "lookup_pepper": "matrixrocks", "algorithm": "sha256" } ``` @@ -144,8 +145,8 @@ behavior. However, a conscious effort should be made by all users to use the privacy respecting endpoints outlined above. Identity servers may disallow use of the v1 endpoint. -Unpadded base64 has been chosen to encode the value due to its ubiquitous -support in many languages, however it does mean that special characters in the +Unpadded base64 has been chosen to encode the value due to use in many other +portions of the spec. However, it does mean that special characters in the address will have to be encoded when used as a parameter value. ## Other considered solutions @@ -160,16 +161,22 @@ Bloom filters are an alternative method of providing private contact discovery, however does not scale well due to clients needing to download a large filter that needs updating every time a new bind is made. Further considered solutions are explored in https://signal.org/blog/contact-discovery/ Signal's eventual -solution of using SGX is considered impractical for a Matrix-style setup. +solution of using Software Guard Extensions (detailed in +https://signal.org/blog/private-contact-discovery/) is considered impractical +for a federated network, as it requires specialized hardware. While a bit out of scope for this MSC, there has been debate over preventing 3PIDs as being kept as plain-text on disk. The argument against this was that if the hashing algorithm (in this case SHA-256) was broken, we couldn't update -the hashing algorithm without having the plaintext 3PIDs. Well @toml helpfully +the hashing algorithm without having the plaintext 3PIDs. @lampholder helpfully added that we could just take the old hashes and rehash them in the more secure hashing algorithm, thus transforming the hash from SHA-256 to -SHA-256+SomeBetterAlg. This may spur on an MSC in the future that supports -this, unless it is just an implementation detail. +SHA-256+SomeBetterAlg. However @erikjohnston then pointed out that if +`BrokenAlgo(a) == BrokenAlgo(b)` then `SuperGreatHash(BrokenAlgo(a)) == +SuperGreatHash(BrokenAlgo(b))`, so all you'd need to do is find a match in the +broken algo, and you'd break the new algorithm as well. This means that you +would need the plaintext 3pids to encode a new hash, and thus storing them +hashed on disk is not possible. ## Conclusion From 36a35a33cc975c3a594048cb8cd94d09f36d0f79 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 24 Jun 2019 16:59:58 +0100 Subject: [PATCH 24/67] Clarify how the spec defines hashing algs --- proposals/2134-identity-hash-lookup.md | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 2240b640..a2b3f157 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -46,12 +46,12 @@ GET /_matrix/identity/v2/hash_details ``` This endpoint takes no parameters, and simply returns supported hash algorithms -and pepper as a JSON object: +and peppers as a JSON object: ``` { "lookup_pepper": "matrixrocks", - "algorithm": "sha256", + "algorithms": ["sha256"], } ``` @@ -60,9 +60,11 @@ returned for other endpoints in the future. Clients should request this endpoint each time before making a `/lookup` or `/bulk_lookup` request, to handle identity servers which may rotate their -pepper values frequently. +pepper values frequently. Clients must choose one of the given hash algorithms +to encrypt the 3pid during lookup. -An example of generating a hash using the above hash and pepper is as follows: +An example of generating a hash using SHA-256 and the provided pepper is as +follows: ```python address = "user@example.org" @@ -73,10 +75,12 @@ print(result_address) vNjEQuRCOmBp/KTuIpZ7RUJgPAbVAyqa0Uzh770tQaw ``` -SHA-256 MUST be supported by both servers and clients at a minimum. It has been -chosen as it is [currently used -elsewhere](https://matrix.org/docs/spec/server_server/r0.1.2#adding-hashes-and-signatures-to-outgoing-events) -in the Matrix protocol, and is reasonably secure as of 2019. +Possible hashing algorithms will be defined in the Matrix specification, and an +Identity Server can choose to implement one or all of them. Later versions of +the specification may deprecate algorithms when necessary. Currently the only +listed hashing algorithm is SHA-256 as defined by [RFC +4634](https://tools.ietf.org/html/rfc4634) and Identity Servers and clients +MUST agree to its use with the string `sha256`. When performing a lookup, the pepper and hashing algorithm the client used must be part of the request body. If they do not match what the server has on file From 0a4c83ddb9107ed5f8420dfd0df09bc8b4025d19 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 24 Jun 2019 17:54:23 +0100 Subject: [PATCH 25/67] no plural. 3pid -> 3PID --- proposals/2134-identity-hash-lookup.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index a2b3f157..f8389e44 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -45,8 +45,8 @@ requires before sending it hashes. A new endpoint must be added: GET /_matrix/identity/v2/hash_details ``` -This endpoint takes no parameters, and simply returns supported hash algorithms -and peppers as a JSON object: +This endpoint takes no parameters, and simply returns any supported hash +algorithms and pepper as a JSON object: ``` { @@ -61,7 +61,7 @@ returned for other endpoints in the future. Clients should request this endpoint each time before making a `/lookup` or `/bulk_lookup` request, to handle identity servers which may rotate their pepper values frequently. Clients must choose one of the given hash algorithms -to encrypt the 3pid during lookup. +to encrypt the 3PID during lookup. An example of generating a hash using SHA-256 and the provided pepper is as follows: @@ -130,8 +130,8 @@ implementation, and should return a HTTP 404 if so. If an identity server is too old and a HTTP 404, 405 or 501 is received when accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. -However, clients should be aware that plain-text 3pids are required, and MUST -ask for user consent to send 3pids in plain-text, and be clear about where they +However, clients should be aware that plain-text 3PIDs are required, and MUST +ask for user consent to send 3PIDs in plain-text, and be clear about where they are being sent to. ## Tradeoffs @@ -179,7 +179,7 @@ SHA-256+SomeBetterAlg. However @erikjohnston then pointed out that if `BrokenAlgo(a) == BrokenAlgo(b)` then `SuperGreatHash(BrokenAlgo(a)) == SuperGreatHash(BrokenAlgo(b))`, so all you'd need to do is find a match in the broken algo, and you'd break the new algorithm as well. This means that you -would need the plaintext 3pids to encode a new hash, and thus storing them +would need the plaintext 3PIDs to encode a new hash, and thus storing them hashed on disk is not possible. ## Conclusion From fae6883cc03341b5eb2c417abd013201875c2279 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 25 Jun 2019 10:18:11 +0100 Subject: [PATCH 26/67] Update with review comments --- proposals/2134-identity-hash-lookup.md | 34 +++++++++++++------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index f8389e44..ccbc38b9 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -11,7 +11,7 @@ If the 3PID is hashed, the identity server could not determine the address unless it has already seen that address in plain-text during a previous call of the /bind mechanism. -Note that in terms of privacy, this proposal does not stop an identity service +Note that in terms of privacy, this proposal does not stop an identity server from mapping hashed 3PIDs to users, resulting in a social graph. However, the identity of the 3PID will at least remain a mystery until /bind is used. @@ -32,13 +32,13 @@ other endpoints): value, and the resulting digest MUST be encoded in unpadded base64. Identity servers must specify their own hashing algorithms (from a list of -specified values) and peppers, which will be useful if a rainbow table is +specified values) and pepper, which will be useful if a rainbow table is released for their current one. Identity servers could also set a timer for rotating the pepper value to further impede rainbow table publishing (the -recommended period is every 30m, which should be enough for a client to +recommended period is every 30 minutes, which should be enough for a client to complete the hashing of all of a user's contacts, but also be nowhere near as long enough to create a sophisticated rainbow table). As such, it must be -possible for clients to be able to query what pepper an identity server +possible for clients to be able to query what pepper the identity server requires before sending it hashes. A new endpoint must be added: ``` @@ -80,12 +80,16 @@ Identity Server can choose to implement one or all of them. Later versions of the specification may deprecate algorithms when necessary. Currently the only listed hashing algorithm is SHA-256 as defined by [RFC 4634](https://tools.ietf.org/html/rfc4634) and Identity Servers and clients -MUST agree to its use with the string `sha256`. +MUST agree to its use with the string `sha256`. SHA-256 was chosen as it is +currently used throughout the Matrix spec, as well as its properties of being +quick to hash. While this reduces the resources necessary to generate a rainbow +table for attackers, a fast hash is necessary if particularly slow mobile +clients are going to be hashing thousands of contacts. When performing a lookup, the pepper and hashing algorithm the client used must be part of the request body. If they do not match what the server has on file (which may be the case if the pepper was rotated right after the client's -request for it), then the server can inform the client that they need to query +request for it), then the server must inform the client that they need to query the hash details again, instead of just returning an empty response, which clients would assume to mean that no contacts are registered on that identity server. @@ -117,20 +121,16 @@ following: If the pepper does not match the server's, the server should return a `400 M_INVALID_PARAM`. -No parameter changes will be made to /bind, but identity servers should keep a -hashed value for each address it knows about in order to process lookups -quicker. It is the recommendation that this is done during the act of binding. -Be wary that these hashes will need to be changed whenever the server's pepper -is rotated. +No parameter changes will be made to /bind. ## Fallback considerations `v1` versions of these endpoints may be disabled at the discretion of the -implementation, and should return a HTTP 404 if so. +implementation, and should return a HTTP 403 if so. If an identity server is too old and a HTTP 404, 405 or 501 is received when accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. -However, clients should be aware that plain-text 3PIDs are required, and MUST +However, clients should be aware that plain-text 3PIDs are required, and SHOULD ask for user consent to send 3PIDs in plain-text, and be clear about where they are being sent to. @@ -147,11 +147,10 @@ This proposal does not force an identity server to stop handling plain-text requests, because a large amount of the Matrix ecosystem relies upon this behavior. However, a conscious effort should be made by all users to use the privacy respecting endpoints outlined above. Identity servers may disallow use -of the v1 endpoint. +of the v1 endpoint, as per above. Unpadded base64 has been chosen to encode the value due to use in many other -portions of the spec. However, it does mean that special characters in the -address will have to be encoded when used as a parameter value. +portions of the spec. ## Other considered solutions @@ -180,7 +179,8 @@ SHA-256+SomeBetterAlg. However @erikjohnston then pointed out that if SuperGreatHash(BrokenAlgo(b))`, so all you'd need to do is find a match in the broken algo, and you'd break the new algorithm as well. This means that you would need the plaintext 3PIDs to encode a new hash, and thus storing them -hashed on disk is not possible. +hashed on disk would require a transition period where 3pids were reuploaded in +a strong hash variant. ## Conclusion From f951f312e1ac610b4bcf2d3af1422633d0e13a12 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 25 Jun 2019 10:30:29 +0100 Subject: [PATCH 27/67] Fix terrible wording --- proposals/2134-identity-hash-lookup.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index ccbc38b9..b9a53cb0 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -160,11 +160,11 @@ bind, as it cannot trust a homeserver to do so as the homeserver may be lying. Additionally, only storing 3PID hashes at rest instead of the plain-text versions is impractical if the hashing algorithm ever needs to be changed. -Bloom filters are an alternative method of providing private contact discovery, -however does not scale well due to clients needing to download a large filter -that needs updating every time a new bind is made. Further considered solutions -are explored in https://signal.org/blog/contact-discovery/ Signal's eventual -solution of using Software Guard Extensions (detailed in +Bloom filters are an alternative method of providing private contact discovery. +However, they do not scale well due to requiring clients to download a large +filter that needs updating every time a new bind is made. Further considered +solutions are explored in https://signal.org/blog/contact-discovery/. Signal's +eventual solution of using Software Guard Extensions (detailed in https://signal.org/blog/private-contact-discovery/) is considered impractical for a federated network, as it requires specialized hardware. From 96e43aaf45469998cfa65f097b099fbac5042b14 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 25 Jun 2019 10:37:45 +0100 Subject: [PATCH 28/67] Define what characters lookup_pepper can consist of --- proposals/2134-identity-hash-lookup.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index b9a53cb0..34e9b0a6 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -56,7 +56,8 @@ algorithms and pepper as a JSON object: ``` The name `lookup_pepper` was chosen in order to account for pepper values being -returned for other endpoints in the future. +returned for other endpoints in the future. The contents of `lookup_pepper` +MUST match the regular expression `[a-zA-Z0-9]*`. Clients should request this endpoint each time before making a `/lookup` or `/bulk_lookup` request, to handle identity servers which may rotate their From df88b13ce13690fb46e8a587d81f4011019c0acc Mon Sep 17 00:00:00 2001 From: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> Date: Tue, 25 Jun 2019 18:15:02 +0100 Subject: [PATCH 29/67] Update proposals/2134-identity-hash-lookup.md Co-Authored-By: Hubert Chathi --- proposals/2134-identity-hash-lookup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 34e9b0a6..ba1d974e 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -29,7 +29,7 @@ other endpoints): - `/_matrix/identity/v2/bulk_lookup` `address` MUST no longer be in a plain-text format, but rather will be a peppered hash -value, and the resulting digest MUST be encoded in unpadded base64. +value encoded in unpadded base64. Identity servers must specify their own hashing algorithms (from a list of specified values) and pepper, which will be useful if a rainbow table is From dfb37fcce1932fed58700a21b336687bba061a1e Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 25 Jun 2019 18:55:18 +0100 Subject: [PATCH 30/67] update with feedback --- proposals/2134-identity-hash-lookup.md | 30 ++++++++++++-------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index ba1d974e..94cd4e48 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -31,15 +31,11 @@ other endpoints): `address` MUST no longer be in a plain-text format, but rather will be a peppered hash value encoded in unpadded base64. -Identity servers must specify their own hashing algorithms (from a list of -specified values) and pepper, which will be useful if a rainbow table is -released for their current one. Identity servers could also set a timer for -rotating the pepper value to further impede rainbow table publishing (the -recommended period is every 30 minutes, which should be enough for a client to -complete the hashing of all of a user's contacts, but also be nowhere near as -long enough to create a sophisticated rainbow table). As such, it must be -possible for clients to be able to query what pepper the identity server -requires before sending it hashes. A new endpoint must be added: +Identity servers must specify the hashing algorithms and a pepper that they +support, which will allow for rotation if a rainbow table is ever released +coinciding with their current hash and pepper. As such, it must be possible for +clients to be able to query what pepper the identity server requires before +sending it hashes. A new endpoint must be added: ``` GET /_matrix/identity/v2/hash_details @@ -64,13 +60,13 @@ Clients should request this endpoint each time before making a `/lookup` or pepper values frequently. Clients must choose one of the given hash algorithms to encrypt the 3PID during lookup. -An example of generating a hash using SHA-256 and the provided pepper is as -follows: +Peppers are appended to the end of the 3PID before hashing. An example of +generating a hash using SHA-256 and the provided pepper is as follows: ```python address = "user@example.org" pepper = "matrixrocks" -digest = hashlib.sha256((pepper + address).encode()).digest() +digest = hashlib.sha256((address + pepper).encode()).digest() result_address = unpaddedbase64.encode_base64(digest) print(result_address) vNjEQuRCOmBp/KTuIpZ7RUJgPAbVAyqa0Uzh770tQaw @@ -119,15 +115,17 @@ following: } ``` -If the pepper does not match the server's, the server should return a `400 -M_INVALID_PARAM`. +If the algorithm does not match the server's, the server should return a 400 +`M_INVALID_PARAM`. If the pepper does not match the server's, the server should +return a new error code, 400 `M_INVALID_PEPPER`. A new error code is not +defined for an invalid algorithm as that is considered a client bug. No parameter changes will be made to /bind. ## Fallback considerations `v1` versions of these endpoints may be disabled at the discretion of the -implementation, and should return a HTTP 403 if so. +implementation, and should return a 403 `M_FORBIDDEN` error if so. If an identity server is too old and a HTTP 404, 405 or 501 is received when accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. @@ -180,7 +178,7 @@ SHA-256+SomeBetterAlg. However @erikjohnston then pointed out that if SuperGreatHash(BrokenAlgo(b))`, so all you'd need to do is find a match in the broken algo, and you'd break the new algorithm as well. This means that you would need the plaintext 3PIDs to encode a new hash, and thus storing them -hashed on disk would require a transition period where 3pids were reuploaded in +hashed on disk would require a transition period where 3PIDs were reuploaded in a strong hash variant. ## Conclusion From 0fd4fe254207badb98be9eb1fc968db8913bc323 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Wed, 26 Jun 2019 10:55:44 +0100 Subject: [PATCH 31/67] Add algo/pepper to err resp --- proposals/2134-identity-hash-lookup.md | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 34e9b0a6..accadc25 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -28,8 +28,10 @@ other endpoints): - `/_matrix/identity/v2/lookup` - `/_matrix/identity/v2/bulk_lookup` -`address` MUST no longer be in a plain-text format, but rather will be a peppered hash -value, and the resulting digest MUST be encoded in unpadded base64. +`address` MUST no longer be in a plain-text format, but rather will be a +peppered hash value, and the resulting digest MUST be encoded in URL-safe +unpadded base64 (similar to [room version 4's event +IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)). Identity servers must specify their own hashing algorithms (from a list of specified values) and pepper, which will be useful if a rainbow table is @@ -119,8 +121,23 @@ following: } ``` -If the pepper does not match the server's, the server should return a `400 -M_INVALID_PARAM`. +If the algorithm does not match the server's, the server should return a `400 +M_INVALID_PARAM`. If the pepper does not match the server's, the server should +return a new error code, 400 `M_INVALID_PEPPER`. A new error code is not +defined for an invalid algorithm as that is considered a client bug. Each of +these error responses should contain the correct `algorithm` and +`lookup_pepper` fields. This is to prevent the client from needing to query +`/hash_details` again, thus saving a round-trip. An example response to an +incorrect pepper would be: + +``` +{ + "error": "Incorrect value for lookup_pepper", + "errcode": "M_INVALID_PEPPER", + "algorithm": "sha256", + "lookup_pepper": "matrixrocks" +} +``` No parameter changes will be made to /bind. From 6f81d3774b61e9482c729d1f5894d2803c2d3d35 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 1 Jul 2019 16:23:28 +0100 Subject: [PATCH 32/67] New hashing method --- proposals/2134-identity-hash-lookup.md | 426 +++++++++++++++++++------ 1 file changed, 328 insertions(+), 98 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 2892ea4e..e6593224 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -3,120 +3,112 @@ [Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been recently created in response to a security issue brought up by an independent party. To summarise the issue, lookups (of Matrix user IDs) are performed using -non-hashed 3PIDs (third-party IDs) which means that the identity server can -identify and record every 3PID that the user wants to check, whether that -address is already known by the identity server or not. +plain-text 3PIDs (third-party IDs) which means that the identity server can +identify and record every 3PID that the user has in their contacts, whether +that email address or phone number is already known by the identity server or +not. If the 3PID is hashed, the identity server could not determine the address unless it has already seen that address in plain-text during a previous call of -the /bind mechanism. +the /bind mechanism (without significant resources to reverse the hashes). -Note that in terms of privacy, this proposal does not stop an identity server -from mapping hashed 3PIDs to users, resulting in a social graph. However, the -identity of the 3PID will at least remain a mystery until /bind is used. - -This proposal thus calls for the Identity Service’s /lookup API to use hashed -3PIDs instead of their plain-text counterparts. +This proposal thus calls for the Identity Service API's /lookup endpoint to use +a back-and-forth mechanism of passing partial hashed 3PIDs instead of their +plain-text counterparts, which should leak mess less data to either party. ## Proposal This proposal suggests making changes to the Identity Service API's lookup -endpoints. Due to the nature of this proposal, the new endpoints should be on a -`v2` path (we also drop the `/api` in order to preserve consistency across -other endpoints): +endpoints. Instead of the `/lookup` and `/bulk_lookup` endpoints, this proposal +replaces them with endpoints `/lookup` and `/lookup_hashes`. Additionally, the +endpoints should be on a `v2` path, to avoid confusion with the original +`/lookup`. We also drop the `/api` in order to preserve consistency across +other endpoints: - `/_matrix/identity/v2/lookup` -- `/_matrix/identity/v2/bulk_lookup` +- `/_matrix/identity/v2/lookup_hashes` + +A third endpoint is added for clients to request information about the form +the server expects hashes in. + +- `/_matrix/identity/v2/hash_details` + +The following back-and-forth occurs between the client and server. -`address` MUST no longer be in a plain-text format, but rather will be a -peppered hash value, and the resulting digest MUST be encoded in URL-safe -unpadded base64 (similar to [room version 4's event -IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)). +Let's say the client wants to check the following 3PIDs: -Identity servers must specify the hashing algorithms and a pepper that they -support, which will allow for rotation if a rainbow table is ever released -coinciding with their current hash and pepper. As such, it must be possible for -clients to be able to query what pepper the identity server requires before -sending it hashes. A new endpoint must be added: + alice@example.com + bob@example.com + carl@example.com + +1 234 567 8910 + denny@example.com -``` -GET /_matrix/identity/v2/hash_details -``` +The client will hash each 3PID as a concatenation of the medium and address, +separated by a space and a pepper appended to the end. Note that phone numbers +should be formatted as defined by +https://matrix.org/docs/spec/appendices#pstn-phone-numbers, before being +hashed). -This endpoint takes no parameters, and simply returns any supported hash -algorithms and pepper as a JSON object: + "alice@example.com" -> "email alice@example.com" + "bob@example.com" -> "email bob@example.com" + "carl@example.com" -> "email carl@example.com" + "+1 234 567 8910" -> "msisdn 12345678910" + "denny@example.com" -> "email denny@example.com" -``` -{ - "lookup_pepper": "matrixrocks", - "algorithms": ["sha256"], -} -``` +Hashes must be peppered in order to reduce both the information a client gains +during the process, and attacks the identity server can perform (namely sending +a rainbow table of hashes back in the response to `/lookup`). The resulting +digest MUST be encoded in URL-safe unpadded base64 (similar to [room version +4's event IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)). + +In order for clients to know the pepper and hashing algorithm they should use, +Identity Servers must make the information available on the `/hash_details` +endpoint: + + GET /_matrix/identity/v2/hash_details + + { + "lookup_pepper": "matrixrocks", + "algorithms": ["sha256"] + } The name `lookup_pepper` was chosen in order to account for pepper values being returned for other endpoints in the future. The contents of `lookup_pepper` MUST match the regular expression `[a-zA-Z0-9]*`. -Clients should request this endpoint each time before making a `/lookup` or -`/bulk_lookup` request, to handle identity servers which may rotate their -pepper values frequently. Clients must choose one of the given hash algorithms -to encrypt the 3PID during lookup. - -Peppers are appended to the end of the 3PID before hashing. An example of -generating a hash using SHA-256 and the provided pepper is as follows: - -```python -address = "user@example.org" -pepper = "matrixrocks" -digest = hashlib.sha256((address + pepper).encode()).digest() -result_address = unpaddedbase64.encode_base64(digest) -print(result_address) -vNjEQuRCOmBp/KTuIpZ7RUJgPAbVAyqa0Uzh770tQaw -``` - -Possible hashing algorithms will be defined in the Matrix specification, and an -Identity Server can choose to implement one or all of them. Later versions of -the specification may deprecate algorithms when necessary. Currently the only -listed hashing algorithm is SHA-256 as defined by [RFC -4634](https://tools.ietf.org/html/rfc4634) and Identity Servers and clients -MUST agree to its use with the string `sha256`. SHA-256 was chosen as it is -currently used throughout the Matrix spec, as well as its properties of being -quick to hash. While this reduces the resources necessary to generate a rainbow -table for attackers, a fast hash is necessary if particularly slow mobile -clients are going to be hashing thousands of contacts. + The client should append the pepper to the end of the 3pid string before + hashing. + + "email alice@example.com" -> "email alice@example.commatrixrocks" + "email bob@example.com" -> "email bob@example.commatrixrocks" + "email carl@example.com" -> "email carl@example.commatrixrocks" + "msisdn 12345678910" -> "msisdn 12345678910matrixrocks" + "email denny@example.com" -> "email denny@example.commatrixrocks" + +Clients SHOULD request this endpoint each time before performing a lookup, to +handle identity servers which may rotate their pepper values frequently. +Clients MUST choose one of the given hash algorithms to encrypt the 3PID during +lookup. + +Note that possible hashing algorithms will be defined in the Matrix +specification, and an Identity Server can choose to implement one or all of +them. Later versions of the specification may deprecate algorithms when +necessary. Currently the only listed hashing algorithm is SHA-256 as defined by +[RFC 4634](https://tools.ietf.org/html/rfc4634) and Identity Servers and +clients MUST agree to its use with the string `sha256`. SHA-256 was chosen as +it is currently used throughout the Matrix spec, as well as its properties of +being quick to hash. While this reduces the resources necessary to generate a +rainbow table for attackers, a fast hash is necessary if particularly slow +mobile clients are going to be hashing thousands of contact details. When performing a lookup, the pepper and hashing algorithm the client used must be part of the request body. If they do not match what the server has on file -(which may be the case if the pepper was rotated right after the client's +(which may be the case if the pepper was changed right after the client's request for it), then the server must inform the client that they need to query the hash details again, instead of just returning an empty response, which clients would assume to mean that no contacts are registered on that identity server. -Thus, an example client request to `/bulk_lookup` would look like the -following: - -``` -{ - "threepids": [ - [ - "email", - "vNjEQuRCOmBp/KTuIpZ7RUJgPAbVAyqa0Uzh770tQaw" - ], - [ - "msisdn", - "0VnvYk7YZpe08fP/CGqs3f39QtRjqAA2lPd14eLZXiw" - ], - [ - "email", - "BJaLI0RrLFDMbsk0eEp5BMsYDYzvOzDneQP/9NTemYA" - ] - ], - "lookup_pepper": "matrixrocks", - "algorithm": "sha256" -} -``` - If the algorithm does not match the server's, the server should return a `400 M_INVALID_PARAM`. If the pepper does not match the server's, the server should return a new error code, 400 `M_INVALID_PEPPER`. A new error code is not @@ -127,14 +119,252 @@ Each of these error responses should contain the correct `algorithm` and `/hash_details` again, thus saving a round-trip. An example response to an incorrect pepper would be: -``` -{ - "error": "Incorrect value for lookup_pepper", - "errcode": "M_INVALID_PEPPER", - "algorithm": "sha256", - "lookup_pepper": "matrixrocks" -} -``` + { + "error": "Incorrect value for lookup_pepper", + "errcode": "M_INVALID_PEPPER", + "algorithm": "sha256", + "lookup_pepper": "matrixrocks" + } + +Now comes time for the lookup. Once hashing has been performed using the +defined hashing algorithm, the client sends the first `k` characters of each +hash in an array, deduplicating any matching entries. + +`k` is a value chosen by the client. It is a tradeoff between leaking the +hashes of 3PIDs that the Identity Server doesn't know about, and the amount of +hashing the server must perform. In addition to k, the client can also set a +`max_k` that it is comfortable with. The recommended values are `k = 4` and +`max_k = 6` (see below for the reasoning behind this). Let's say the client +chooses these values. + + NOTE: Example numbers, not real hash values. + + "email alice@example.commatrixrocks" -> "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4" + "email bob@example.commatrixrocks" -> "21375b56a47c2cdc41a0596549a16ec51b64d26eb47b8e915d45b18ed17b72ff" + "email carl@example.commatrixrocks" -> "758afda64cb6a86ee6d540fa7c8b803a2479863e369cbafd71ffd376beef5d5f" + "msisdn 12345678910matrixrocks" -> "21375b3f1b61c975b13c8cecd6481a82e239e6aad644c29dc815836188ae8351" + "email denny@example.commatrixrocks" -> "70b1b5637937ab9846a94a8015e12313643a2f5323ca8f5b4ed6982fc8c3619b" + + Note that pairs (bob@example.com, 12345678910) and (alice@example.com, denny@example.com) + have the same leading characters in their hashed representations. + + POST /_matrix/identity/v2/lookup + + { + "hashes": [ + "70b1", + "2137", + "758a" + ], + "algorithm": "sha256", + "pepper": "matrixrocks" + } + +The identity server, upon receiving these partial hashes, can see that the +client chose `4` as its `k` value, which is the length of the shortest hash +prefix. The identity server has a "minimum k", which is a function of the +amount of 3PID hashes it currently holds and protects it against computing too +many per lookup. Let's say the Identity Server's `min_k = 5` (again, see below +for details). + +The client's `k` value (4) is less than the Identity Server's `min_k` (5), so +it will reject the lookup with the following error: + + { + "errcode": "M_HASH_TOO_SHORT", + "error": "Sent partial hashes are too short", + "minimum_length": "5" + } + +The client then knows it must send values of at least length 5. It's `max_k` is +6, so this is fine. The client sends the values again with `k = 5`: + + POST /_matrix/identity/v2/lookup + + { + "hashes": [ + "70b1b", + "21375", + "758af" + ], + "algorithm": "sha256", + "pepper": "matrixrocks" + } + +The Identity Server sees the hashes are within an acceptable length (5 >= 5), +then checks which hashes it knows of that match the given leading values. It +will then return the next few characters (`n`; implementation-specific; lower +means less information leaked to clients at the result of potentially more +hashing to be done) of each that match: + + The identity server found the following hashes that contain the leading + characters: + + 70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4 + 70b1b1b28dcfcc179a54983f46e1753c3fcdb0884d06fad741582c0180b56fc9 + 21375b3f1b61c975b13c8cecd6481a82e239e6aad644c29dc815836188ae8351 + + And if n = 7, the identity server will send back the following payload: + + { + "hashes": { + "70b1b": ["5637937", "1b28dcf"], + "21375": ["b3f1b61"] + } + } + +The client can then deduce which hashes actually lead to Matrix IDs. In this +case, 70b1b5637937 are the leading characters of "alice@example.com" and +"denny@example.com", while 21375b3f1b61 are the leading characters of +"+12345678910" whereas 70b1b1b28dcf does not match any of the hashes the client +has locally, so it is ignored. "bob@example.com" and "carl@example.com" do not +seem to have Matrix IDs associated with them. + +Finally, the client salts and hashes 3PID hashes that it believes are +associated with Matrix IDs and sends them to the identity server on the +`/lookup_hashes` endpoint. Instead of hashing the 3PIDs again, clients should +reuse the peppered hash that was previously sent to the server. Salting is +performed to prevent an identity server generating a rainbow table to reverse +any non-Matrix 3PIDs that slipped in. Salts MUST match the regular expression +`[a-zA-Z0-9]*`. + + Computed previously: + + "email alice@example.commatrixrocks" + becomes + "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4" + + The client should generate a salt. Let's say it generates "salt123". This + value is appended to the hash. + + "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4" + becomes + "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4salt123" + + And then hashed: + + "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4salt123" + becomes + "1f64ed6ac9d6da86b65bcc68a39c7c4d083f77193ec7e5adc4b09617f8d0d81a" + +A new salt is generated and applied to each hash **prefix** individually. Doing +so requires the identity server to only rehash the 3PIDs whose unsalted hashes +matched the earlier prefixes (in the case of 70b1b, hashes 5637937... and +1b28dcf...). This adds only a small multiplier of additional hashes needing to +be performed by the Identity Server (the median number of hashes that fit each +prefix, a function of the chosen `k` value). + +An attacker would now need to create a new rainbow table per hash prefix, per +lookup. This reduces the attack surface significantly to only very targeted +attacks. + + POST /_matrix/identity/v2/lookup_hashes + + { + "hashes": { + "70b1b": { + "1": "1f64ed6ac9d6da86b65bcc68a39c7c4d083f77193ec7e5adc4b09617f8d0d81a", + "2": "a32e1c1f3b9e118eab196b0807443871628eace587361b7a02adfb2b77b8d620" + }, + "21375": { + "1": "372bf27a4e7e952d1e794f78f8cdfbff1a3ab2f59c6d44e869bfdd7dd1de3948" + } + }, + "salts": { + "70b1b": "salt123", + "21375": "salt234" + } + } + +The server reads the prefixes and only rehashes those 3PIDs that match these +hashes (being careful to continue to enforce its `min_k` requirement), and +returns them: + + { + "mappings": { + "70b1b": { + "2": "@alice:example.com" + }, + "21375": { + "1": "@fred:example.com" + } + } + } + +The client can now display which 3PIDs link to which Matrix IDs. + +### How to pick k + +The `k` value is a tradeoff between the privacy of the user's contacts, and the +resource-intensiveness of lookups for the identity server. Clients would rather +have a smaller `k`, while servers a larger `k`. A larger `k` also allows the +identity server to learn more about the contacts the client has that are not +Matrix users. Ideally we'd like to balance these two, and with the value also +being a factor of how many records an identity server has, there's no way to +simply give a single `k` value that should be used from the spec. + +Instead, we can have the client and identity server decide it amongst +themselves. The identity server should pick a `k` value based on how many 3PIDs +records they have, and thus how much hashes they will need to perform. An ideal +value can be calculated from the following function: + + C <= N / (64 ^ k) + + Where N is the number of 3PID records an identity server has, k is the number of + characters to truncate each hash to, and C is the median number of hashing rounds + an identity server will need to perform per hash (denoted complexity). 64 is the + number of possible characters per byte in a hash, as hash digests are encoded in + url-safe base64. + + Identity servers should choose a complexity value they're comfortable with. + Let's say 5 (for reference, HIBP's service has set their k value for a complexity + of 478: https://blog.cloudflare.com/validating-leaked-passwords-with-k-anonymity/) + + When C is set (implementation specific), k can then be solved for: + + k >= - log(C/N) + ---------- + - log(64) + + Taking HIBP's amount of passwords as an example, 600,000,000, as N and solving for k, we get: + + k >= 4.47 + + We round k to 5 for it to be a whole number. + + As this is quite a lot of records, we advise clients to start with k = 4, and go from there. + + For reference, a very small identity server with only 600 records would produce a + minimum k of 0.628, or 1. + + From this we can see that even low k values scale to quite a lot of records. + +Clients themselves should pick a reasonable default `k`, and a maximum value +that they are comfortable extending towards if the identity server requests a +higher minimum number. If the identity server requests too high of a minimum +number, clients will need to inform the user, either with an error message, or +more advanced clients could allow users to tweak their k values. + +--- + +Past what they already knew, from this exchange the client and server have learned: + +Client: + +* Unsalted, peppered partial 3PID hash "70b1b1b28dcf" + of some matrix user + (harder to crack, and new rainbow table needed) +* alice@example.com -> @alice:example.com (required) +* +1 234 567 8910 -> @fred:example.com (required) + +Server: + +* Partial hash "758af" (likely useless) +* The server knows some salted hash + 70b1b5637937ab9846a94a8015e12313643a2f5323ca8f5b4ed6982fc8c3619bf + (crackable, new rainbow table needed) + +--- No parameter changes will be made to /bind. @@ -151,10 +381,10 @@ are being sent to. ## Tradeoffs -* This approach means that the client now needs to calculate a hash by itself, - but the belief is that most languages provide a mechanism for doing so. * There is a small cost incurred by performing hashes before requests, but this is outweighed by the privacy implications of sending plain-text addresses. +* Identity services will need to perform a lot of hashing, however with + authentication being added in MSC 2140, effective rate-limiting is possible. ## Potential issues @@ -186,14 +416,14 @@ for a federated network, as it requires specialized hardware. While a bit out of scope for this MSC, there has been debate over preventing 3PIDs as being kept as plain-text on disk. The argument against this was that if the hashing algorithm (in this case SHA-256) was broken, we couldn't update -the hashing algorithm without having the plaintext 3PIDs. @lampholder helpfully +the hashing algorithm without having the plain-text 3PIDs. @lampholder helpfully added that we could just take the old hashes and rehash them in the more secure hashing algorithm, thus transforming the hash from SHA-256 to SHA-256+SomeBetterAlg. However @erikjohnston then pointed out that if `BrokenAlgo(a) == BrokenAlgo(b)` then `SuperGreatHash(BrokenAlgo(a)) == SuperGreatHash(BrokenAlgo(b))`, so all you'd need to do is find a match in the broken algo, and you'd break the new algorithm as well. This means that you -would need the plaintext 3PIDs to encode a new hash, and thus storing them +would need the plain-text 3PIDs to encode a new hash, and thus storing them hashed on disk would require a transition period where 3PIDs were reuploaded in a strong hash variant. From 922a20ba2625cb7205b01817ad383e471b27ebfa Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 1 Jul 2019 16:30:07 +0100 Subject: [PATCH 33/67] small fixes --- proposals/2134-identity-hash-lookup.md | 39 +++++++++++++------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index e6593224..ec2beb47 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -47,7 +47,7 @@ The client will hash each 3PID as a concatenation of the medium and address, separated by a space and a pepper appended to the end. Note that phone numbers should be formatted as defined by https://matrix.org/docs/spec/appendices#pstn-phone-numbers, before being -hashed). +hashed). First the client must prepend the medium to the address: "alice@example.com" -> "email alice@example.com" "bob@example.com" -> "email bob@example.com" @@ -57,9 +57,7 @@ hashed). Hashes must be peppered in order to reduce both the information a client gains during the process, and attacks the identity server can perform (namely sending -a rainbow table of hashes back in the response to `/lookup`). The resulting -digest MUST be encoded in URL-safe unpadded base64 (similar to [room version -4's event IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)). +a rainbow table of hashes back in the response to `/lookup`). In order for clients to know the pepper and hashing algorithm they should use, Identity Servers must make the information available on the `/hash_details` @@ -126,9 +124,11 @@ incorrect pepper would be: "lookup_pepper": "matrixrocks" } -Now comes time for the lookup. Once hashing has been performed using the -defined hashing algorithm, the client sends the first `k` characters of each -hash in an array, deduplicating any matching entries. +Now comes time for the lookup. Note that the resulting hash digest MUST be +encoded in URL-safe unpadded base64 (similar to [room version 4's event +IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)). Once hashing has been +performed using the defined hashing algorithm, the client sends the first `k` +characters of each hash in an array, deduplicating any matching entries. `k` is a value chosen by the client. It is a tradeoff between leaking the hashes of 3PIDs that the Identity Server doesn't know about, and the amount of @@ -137,7 +137,7 @@ hashing the server must perform. In addition to k, the client can also set a `max_k = 6` (see below for the reasoning behind this). Let's say the client chooses these values. - NOTE: Example numbers, not real hash values. + NOTE: Example digests, not real hash values. "email alice@example.commatrixrocks" -> "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4" "email bob@example.commatrixrocks" -> "21375b56a47c2cdc41a0596549a16ec51b64d26eb47b8e915d45b18ed17b72ff" @@ -145,8 +145,9 @@ chooses these values. "msisdn 12345678910matrixrocks" -> "21375b3f1b61c975b13c8cecd6481a82e239e6aad644c29dc815836188ae8351" "email denny@example.commatrixrocks" -> "70b1b5637937ab9846a94a8015e12313643a2f5323ca8f5b4ed6982fc8c3619b" - Note that pairs (bob@example.com, 12345678910) and (alice@example.com, denny@example.com) - have the same leading characters in their hashed representations. + Also note that pairs (bob@example.com, 12345678910) and (alice@example.com, + denny@example.com) have the same leading characters in their hashed + representations. POST /_matrix/identity/v2/lookup @@ -214,9 +215,9 @@ hashing to be done) of each that match: } The client can then deduce which hashes actually lead to Matrix IDs. In this -case, 70b1b5637937 are the leading characters of "alice@example.com" and -"denny@example.com", while 21375b3f1b61 are the leading characters of -"+12345678910" whereas 70b1b1b28dcf does not match any of the hashes the client +case, `70b1b5637937` are the leading characters of "alice@example.com" and +"denny@example.com", while `21375b3f1b61` are the leading characters of +"+12345678910" and `70b1b1b28dcf` does not match any of the hashes the client has locally, so it is ignored. "bob@example.com" and "carl@example.com" do not seem to have Matrix IDs associated with them. @@ -247,12 +248,12 @@ any non-Matrix 3PIDs that slipped in. Salts MUST match the regular expression becomes "1f64ed6ac9d6da86b65bcc68a39c7c4d083f77193ec7e5adc4b09617f8d0d81a" -A new salt is generated and applied to each hash **prefix** individually. Doing -so requires the identity server to only rehash the 3PIDs whose unsalted hashes -matched the earlier prefixes (in the case of 70b1b, hashes 5637937... and -1b28dcf...). This adds only a small multiplier of additional hashes needing to -be performed by the Identity Server (the median number of hashes that fit each -prefix, a function of the chosen `k` value). +A new salt is generated per **hash prefix** and applied to each hash +individually. Doing so requires the identity server to only rehash the 3PIDs +whose unsalted hashes matched the earlier prefixes (in the case of `70b1b`, +hashes `5637937...` and `1b28dcf...`). This adds only a small multiplier of +additional hashes needing to be performed by the Identity Server (the median +number of hashes that fit each prefix, a function of the chosen `k` value). An attacker would now need to create a new rainbow table per hash prefix, per lookup. This reduces the attack surface significantly to only very targeted From 53bd384f2ec87f2bde7c8f3bc1da3ee785cef96e Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Wed, 3 Jul 2019 09:59:38 +0100 Subject: [PATCH 34/67] Clarify salting --- proposals/2134-identity-hash-lookup.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index ec2beb47..3711d9c8 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -236,13 +236,14 @@ any non-Matrix 3PIDs that slipped in. Salts MUST match the regular expression "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4" The client should generate a salt. Let's say it generates "salt123". This - value is appended to the hash. + value is appended to the base64-representation of the hash digest of the + initial 3pid and pepper. "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4" becomes "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4salt123" - And then hashed: + Which is then hashed: "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4salt123" becomes From f4a1e0288419f16500ed047df91cb4ddfc63445e Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Thu, 4 Jul 2019 16:28:15 +0100 Subject: [PATCH 35/67] simple method once more --- proposals/2134-identity-hash-lookup.md | 319 +++++-------------------- 1 file changed, 59 insertions(+), 260 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 3711d9c8..76f527cb 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -13,22 +13,20 @@ unless it has already seen that address in plain-text during a previous call of the /bind mechanism (without significant resources to reverse the hashes). This proposal thus calls for the Identity Service API's /lookup endpoint to use -a back-and-forth mechanism of passing partial hashed 3PIDs instead of their -plain-text counterparts, which should leak mess less data to either party. +hashed 3PIDs instead of their plain-text counterparts, which will leak less +data to identity servers. ## Proposal This proposal suggests making changes to the Identity Service API's lookup -endpoints. Instead of the `/lookup` and `/bulk_lookup` endpoints, this proposal -replaces them with endpoints `/lookup` and `/lookup_hashes`. Additionally, the -endpoints should be on a `v2` path, to avoid confusion with the original -`/lookup`. We also drop the `/api` in order to preserve consistency across -other endpoints: +endpoints. Instead, this proposal consolidates them into a single `/lookup` +endpoint. Additionally, the endpoint should be on a `v2` path, to avoid +confusion with the original `/lookup`. We also drop the `/api` in order to +preserve consistency across other endpoints: - `/_matrix/identity/v2/lookup` -- `/_matrix/identity/v2/lookup_hashes` -A third endpoint is added for clients to request information about the form +A second endpoint is added for clients to request information about the form the server expects hashes in. - `/_matrix/identity/v2/hash_details` @@ -127,248 +125,43 @@ incorrect pepper would be: Now comes time for the lookup. Note that the resulting hash digest MUST be encoded in URL-safe unpadded base64 (similar to [room version 4's event IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)). Once hashing has been -performed using the defined hashing algorithm, the client sends the first `k` -characters of each hash in an array, deduplicating any matching entries. +performed using the defined hashing algorithm, the client sends each hash in an +array. -`k` is a value chosen by the client. It is a tradeoff between leaking the -hashes of 3PIDs that the Identity Server doesn't know about, and the amount of -hashing the server must perform. In addition to k, the client can also set a -`max_k` that it is comfortable with. The recommended values are `k = 4` and -`max_k = 6` (see below for the reasoning behind this). Let's say the client -chooses these values. - - NOTE: Example digests, not real hash values. - - "email alice@example.commatrixrocks" -> "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4" - "email bob@example.commatrixrocks" -> "21375b56a47c2cdc41a0596549a16ec51b64d26eb47b8e915d45b18ed17b72ff" - "email carl@example.commatrixrocks" -> "758afda64cb6a86ee6d540fa7c8b803a2479863e369cbafd71ffd376beef5d5f" - "msisdn 12345678910matrixrocks" -> "21375b3f1b61c975b13c8cecd6481a82e239e6aad644c29dc815836188ae8351" - "email denny@example.commatrixrocks" -> "70b1b5637937ab9846a94a8015e12313643a2f5323ca8f5b4ed6982fc8c3619b" - - Also note that pairs (bob@example.com, 12345678910) and (alice@example.com, - denny@example.com) have the same leading characters in their hashed - representations. + "email alice@example.commatrixrocks" -> "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs" + "email bob@example.commatrixrocks" -> "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE" + "email carl@example.commatrixrocks" -> "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw" + "msisdn 12345678910matrixrocks" -> "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens" + "email denny@example.commatrixrocks" -> "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" POST /_matrix/identity/v2/lookup { "hashes": [ - "70b1", - "2137", - "758a" + "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs", + "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE", + "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw", + "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens", + "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" ], "algorithm": "sha256", "pepper": "matrixrocks" } -The identity server, upon receiving these partial hashes, can see that the -client chose `4` as its `k` value, which is the length of the shortest hash -prefix. The identity server has a "minimum k", which is a function of the -amount of 3PID hashes it currently holds and protects it against computing too -many per lookup. Let's say the Identity Server's `min_k = 5` (again, see below -for details). - -The client's `k` value (4) is less than the Identity Server's `min_k` (5), so -it will reject the lookup with the following error: - - { - "errcode": "M_HASH_TOO_SHORT", - "error": "Sent partial hashes are too short", - "minimum_length": "5" - } - -The client then knows it must send values of at least length 5. It's `max_k` is -6, so this is fine. The client sends the values again with `k = 5`: - - POST /_matrix/identity/v2/lookup - - { - "hashes": [ - "70b1b", - "21375", - "758af" - ], - "algorithm": "sha256", - "pepper": "matrixrocks" - } - -The Identity Server sees the hashes are within an acceptable length (5 >= 5), -then checks which hashes it knows of that match the given leading values. It -will then return the next few characters (`n`; implementation-specific; lower -means less information leaked to clients at the result of potentially more -hashing to be done) of each that match: - - The identity server found the following hashes that contain the leading - characters: - - 70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4 - 70b1b1b28dcfcc179a54983f46e1753c3fcdb0884d06fad741582c0180b56fc9 - 21375b3f1b61c975b13c8cecd6481a82e239e6aad644c29dc815836188ae8351 - - And if n = 7, the identity server will send back the following payload: - - { - "hashes": { - "70b1b": ["5637937", "1b28dcf"], - "21375": ["b3f1b61"] - } - } - -The client can then deduce which hashes actually lead to Matrix IDs. In this -case, `70b1b5637937` are the leading characters of "alice@example.com" and -"denny@example.com", while `21375b3f1b61` are the leading characters of -"+12345678910" and `70b1b1b28dcf` does not match any of the hashes the client -has locally, so it is ignored. "bob@example.com" and "carl@example.com" do not -seem to have Matrix IDs associated with them. - -Finally, the client salts and hashes 3PID hashes that it believes are -associated with Matrix IDs and sends them to the identity server on the -`/lookup_hashes` endpoint. Instead of hashing the 3PIDs again, clients should -reuse the peppered hash that was previously sent to the server. Salting is -performed to prevent an identity server generating a rainbow table to reverse -any non-Matrix 3PIDs that slipped in. Salts MUST match the regular expression -`[a-zA-Z0-9]*`. - - Computed previously: - - "email alice@example.commatrixrocks" - becomes - "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4" - - The client should generate a salt. Let's say it generates "salt123". This - value is appended to the base64-representation of the hash digest of the - initial 3pid and pepper. - - "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4" - becomes - "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4salt123" - - Which is then hashed: - - "70b1b5637937ab99f6aad01f694b3665541a5b9cbdfe54880462b3f1ad35d1f4salt123" - becomes - "1f64ed6ac9d6da86b65bcc68a39c7c4d083f77193ec7e5adc4b09617f8d0d81a" - -A new salt is generated per **hash prefix** and applied to each hash -individually. Doing so requires the identity server to only rehash the 3PIDs -whose unsalted hashes matched the earlier prefixes (in the case of `70b1b`, -hashes `5637937...` and `1b28dcf...`). This adds only a small multiplier of -additional hashes needing to be performed by the Identity Server (the median -number of hashes that fit each prefix, a function of the chosen `k` value). - -An attacker would now need to create a new rainbow table per hash prefix, per -lookup. This reduces the attack surface significantly to only very targeted -attacks. - - POST /_matrix/identity/v2/lookup_hashes - - { - "hashes": { - "70b1b": { - "1": "1f64ed6ac9d6da86b65bcc68a39c7c4d083f77193ec7e5adc4b09617f8d0d81a", - "2": "a32e1c1f3b9e118eab196b0807443871628eace587361b7a02adfb2b77b8d620" - }, - "21375": { - "1": "372bf27a4e7e952d1e794f78f8cdfbff1a3ab2f59c6d44e869bfdd7dd1de3948" - } - }, - "salts": { - "70b1b": "salt123", - "21375": "salt234" - } - } - -The server reads the prefixes and only rehashes those 3PIDs that match these -hashes (being careful to continue to enforce its `min_k` requirement), and -returns them: +The identity server, upon receiving these hashes, can simply compare against +the hashes of the 3PIDs it stores. The server then responds with the Matrix +IDs of those that match: { "mappings": { - "70b1b": { - "2": "@alice:example.com" - }, - "21375": { - "1": "@fred:example.com" - } + "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs": "@alice:example.com", + "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens": "@fred:example.com" } } The client can now display which 3PIDs link to which Matrix IDs. -### How to pick k - -The `k` value is a tradeoff between the privacy of the user's contacts, and the -resource-intensiveness of lookups for the identity server. Clients would rather -have a smaller `k`, while servers a larger `k`. A larger `k` also allows the -identity server to learn more about the contacts the client has that are not -Matrix users. Ideally we'd like to balance these two, and with the value also -being a factor of how many records an identity server has, there's no way to -simply give a single `k` value that should be used from the spec. - -Instead, we can have the client and identity server decide it amongst -themselves. The identity server should pick a `k` value based on how many 3PIDs -records they have, and thus how much hashes they will need to perform. An ideal -value can be calculated from the following function: - - C <= N / (64 ^ k) - - Where N is the number of 3PID records an identity server has, k is the number of - characters to truncate each hash to, and C is the median number of hashing rounds - an identity server will need to perform per hash (denoted complexity). 64 is the - number of possible characters per byte in a hash, as hash digests are encoded in - url-safe base64. - - Identity servers should choose a complexity value they're comfortable with. - Let's say 5 (for reference, HIBP's service has set their k value for a complexity - of 478: https://blog.cloudflare.com/validating-leaked-passwords-with-k-anonymity/) - - When C is set (implementation specific), k can then be solved for: - - k >= - log(C/N) - ---------- - - log(64) - - Taking HIBP's amount of passwords as an example, 600,000,000, as N and solving for k, we get: - - k >= 4.47 - - We round k to 5 for it to be a whole number. - - As this is quite a lot of records, we advise clients to start with k = 4, and go from there. - - For reference, a very small identity server with only 600 records would produce a - minimum k of 0.628, or 1. - - From this we can see that even low k values scale to quite a lot of records. - -Clients themselves should pick a reasonable default `k`, and a maximum value -that they are comfortable extending towards if the identity server requests a -higher minimum number. If the identity server requests too high of a minimum -number, clients will need to inform the user, either with an error message, or -more advanced clients could allow users to tweak their k values. - ---- - -Past what they already knew, from this exchange the client and server have learned: - -Client: - -* Unsalted, peppered partial 3PID hash "70b1b1b28dcf" - of some matrix user - (harder to crack, and new rainbow table needed) -* alice@example.com -> @alice:example.com (required) -* +1 234 567 8910 -> @fred:example.com (required) - -Server: - -* Partial hash "758af" (likely useless) -* The server knows some salted hash - 70b1b5637937ab9846a94a8015e12313643a2f5323ca8f5b4ed6982fc8c3619bf - (crackable, new rainbow table needed) - ---- - -No parameter changes will be made to /bind. +No parameter changes will be made to /bind as part of this proposal. ## Fallback considerations @@ -377,35 +170,34 @@ implementation, and should return a 403 `M_FORBIDDEN` error if so. If an identity server is too old and a HTTP 404, 405 or 501 is received when accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. -However, clients should be aware that plain-text 3PIDs are required, and SHOULD -ask for user consent to send 3PIDs in plain-text, and be clear about where they -are being sent to. +However, clients should be aware that plain-text 3PIDs are required for the +`v1` endpoint, and SHOULD ask for user consent to send 3PIDs in plain-text, and +be clear about where they are being sent to. ## Tradeoffs * There is a small cost incurred by performing hashes before requests, but this is outweighed by the privacy implications of sending plain-text addresses. -* Identity services will need to perform a lot of hashing, however with - authentication being added in MSC 2140, effective rate-limiting is possible. ## Potential issues -This proposal does not force an identity server to stop handling plain-text -requests, because a large amount of the Matrix ecosystem relies upon this -behavior. However, a conscious effort should be made by all users to use the -privacy respecting endpoints outlined above. Identity servers may disallow use -of the v1 endpoint, as per above. +Hashes are still reversible with a rainbow table, but hopefully the provided +pepper, which can be rotated by identity servers at will, should help mitigate +this to some extent. -Unpadded base64 has been chosen to encode the value due to use in many other -portions of the spec. +Additionally, this proposal does not stop an identity server from storing +plain-text 3PIDs. There is a GDPR argument in keeping email addresses, such +that if a breach happens, users must be notified of such. Ideally this would be +done over Matrix, but people may've stuck their email in an identity server and +then left Matrix forever. Perhaps if only hashes were being stored on the +identity server then that isn't considered personal information? In any case, a +discussion for another MSC. ## Other considered solutions Ideally identity servers would never receive plain-text addresses, however it is necessary for the identity server to send email/sms messages during a bind, as it cannot trust a homeserver to do so as the homeserver may be lying. -Additionally, only storing 3PID hashes at rest instead of the plain-text -versions is impractical if the hashing algorithm ever needs to be changed. Bloom filters are an alternative method of providing private contact discovery. However, they do not scale well due to requiring clients to download a large @@ -415,23 +207,30 @@ eventual solution of using Software Guard Extensions (detailed in https://signal.org/blog/private-contact-discovery/) is considered impractical for a federated network, as it requires specialized hardware. -While a bit out of scope for this MSC, there has been debate over preventing -3PIDs as being kept as plain-text on disk. The argument against this was that -if the hashing algorithm (in this case SHA-256) was broken, we couldn't update -the hashing algorithm without having the plain-text 3PIDs. @lampholder helpfully -added that we could just take the old hashes and rehash them in the more secure -hashing algorithm, thus transforming the hash from SHA-256 to -SHA-256+SomeBetterAlg. However @erikjohnston then pointed out that if -`BrokenAlgo(a) == BrokenAlgo(b)` then `SuperGreatHash(BrokenAlgo(a)) == -SuperGreatHash(BrokenAlgo(b))`, so all you'd need to do is find a match in the -broken algo, and you'd break the new algorithm as well. This means that you -would need the plain-text 3PIDs to encode a new hash, and thus storing them -hashed on disk would require a transition period where 3PIDs were reuploaded in -a strong hash variant. +k-anonymity was considered as an alternative, in which the identity server +would never receive a full hash of a 3PID that it did not already know about. +While this has been considered plausible, it comes with heightened resource +requirements (much more hashing by the identity server). The conclusion was +that it may not provide more privacy if an identity server decided to be evil, +however it would significantly raise the resource requirements to run an evil +identity server. + +Discussion and a walk-through of what a client/identity-server interaction would +look like are documented [in this Github +comment](https://github.com/matrix-org/matrix-doc/pull/2134#discussion_r298691748). + +Additionally, a radical model was also considered where the first portion of +the above scheme was done with an identity server, and the second would be done +with various homeservers who originally reported the 3PID to the identity +server. While interesting and a more decentralised model, some attacks are +still possible if the identity server is running an evil homeserver which it +can direct the client to send its hashes to. Discussion on this matter has +taken place in the MSC-specific room [starting at this +message](https://matrix.to/#/!LlraCeVuFgMaxvRySN:amorgan.xyz/$4wzTSsspbLVa6Lx5cBq6toh6P3TY3YnoxALZuO8n9gk?via=amorgan.xyz&via=matrix.org&via=matrix.vgorcum.com). ## Conclusion -This proposal outlines an effective method to stop bulk collection of user's +This proposal outlines a simple method to stop bulk collection of user's contact lists and their social graphs without any disastrous side effects. All functionality which depends on the lookup service should continue to function unhindered by the use of hashes. From 370266942488f56f1fb4aab5d4bab1e3b0989d9d Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Fri, 5 Jul 2019 15:59:29 +0100 Subject: [PATCH 36/67] update from comments --- proposals/2134-identity-hash-lookup.md | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 76f527cb..33bda297 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -10,11 +10,16 @@ not. If the 3PID is hashed, the identity server could not determine the address unless it has already seen that address in plain-text during a previous call of -the /bind mechanism (without significant resources to reverse the hashes). +the [/bind +mechanism](https://matrix.org/docs/spec/identity_service/r0.2.1#post-matrix-identity-api-v1-3pid-bind) +(without significant resources to reverse the hashes). -This proposal thus calls for the Identity Service API's /lookup endpoint to use -hashed 3PIDs instead of their plain-text counterparts, which will leak less -data to identity servers. +This proposal thus calls for the Identity Service API's +[/lookup](https://matrix.org/docs/spec/identity_service/r0.2.1#get-matrix-identity-api-v1-lookup) +endpoint to use hashed 3PIDs instead of their plain-text counterparts (and to +deprecate both it and +[/bulk_lookup](https://matrix.org/docs/spec/identity_service/r0.2.1#post-matrix-identity-api-v1-bulk-lookup)), +which will leak less data to identity servers. ## Proposal @@ -161,14 +166,16 @@ IDs of those that match: The client can now display which 3PIDs link to which Matrix IDs. -No parameter changes will be made to /bind as part of this proposal. +No parameter changes will be made to +[/bind](https://matrix.org/docs/spec/identity_service/r0.2.1#post-matrix-identity-api-v1-3pid-bind) +as part of this proposal. ## Fallback considerations `v1` versions of these endpoints may be disabled at the discretion of the implementation, and should return a 403 `M_FORBIDDEN` error if so. -If an identity server is too old and a HTTP 404, 405 or 501 is received when +If an identity server is too old and a HTTP 400 or 404 is received when accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. However, clients should be aware that plain-text 3PIDs are required for the `v1` endpoint, and SHOULD ask for user consent to send 3PIDs in plain-text, and From dd8a6549c9bfb149c40ef8e83d978c624590a003 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 8 Jul 2019 11:55:37 +0100 Subject: [PATCH 37/67] Address review comments --- proposals/2134-identity-hash-lookup.md | 175 ++++++++++++++----------- 1 file changed, 96 insertions(+), 79 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 33bda297..f1df605f 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -9,10 +9,12 @@ that email address or phone number is already known by the identity server or not. If the 3PID is hashed, the identity server could not determine the address -unless it has already seen that address in plain-text during a previous call of -the [/bind +unless it has already seen that address in plain-text during a previous call +of the [/bind mechanism](https://matrix.org/docs/spec/identity_service/r0.2.1#post-matrix-identity-api-v1-3pid-bind) -(without significant resources to reverse the hashes). +(without significant resources to reverse the hashes). This helps prevent +bulk collection of user's contact lists by the identity server and reduces +its ability to build social graphs. This proposal thus calls for the Identity Service API's [/lookup](https://matrix.org/docs/spec/identity_service/r0.2.1#get-matrix-identity-api-v1-lookup) @@ -25,7 +27,7 @@ which will leak less data to identity servers. This proposal suggests making changes to the Identity Service API's lookup endpoints. Instead, this proposal consolidates them into a single `/lookup` -endpoint. Additionally, the endpoint should be on a `v2` path, to avoid +endpoint. Additionally, the endpoint is to be on a `v2` path, to avoid confusion with the original `/lookup`. We also drop the `/api` in order to preserve consistency across other endpoints: @@ -40,11 +42,13 @@ The following back-and-forth occurs between the client and server. Let's say the client wants to check the following 3PIDs: - alice@example.com - bob@example.com - carl@example.com - +1 234 567 8910 - denny@example.com +``` +alice@example.com +bob@example.com +carl@example.com ++1 234 567 8910 +denny@example.com +``` The client will hash each 3PID as a concatenation of the medium and address, separated by a space and a pepper appended to the end. Note that phone numbers @@ -52,55 +56,59 @@ should be formatted as defined by https://matrix.org/docs/spec/appendices#pstn-phone-numbers, before being hashed). First the client must prepend the medium to the address: - "alice@example.com" -> "email alice@example.com" - "bob@example.com" -> "email bob@example.com" - "carl@example.com" -> "email carl@example.com" - "+1 234 567 8910" -> "msisdn 12345678910" - "denny@example.com" -> "email denny@example.com" +``` +"alice@example.com" -> "email alice@example.com" +"bob@example.com" -> "email bob@example.com" +"carl@example.com" -> "email carl@example.com" +"+1 234 567 8910" -> "msisdn 12345678910" +"denny@example.com" -> "email denny@example.com" +``` Hashes must be peppered in order to reduce both the information a client gains during the process, and attacks the identity server can perform (namely sending a rainbow table of hashes back in the response to `/lookup`). In order for clients to know the pepper and hashing algorithm they should use, -Identity Servers must make the information available on the `/hash_details` +Identity servers must make the information available on the `/hash_details` endpoint: - GET /_matrix/identity/v2/hash_details +``` +GET /_matrix/identity/v2/hash_details - { - "lookup_pepper": "matrixrocks", - "algorithms": ["sha256"] - } +{ + "lookup_pepper": "matrixrocks", + "algorithms": ["sha256"] +} +``` The name `lookup_pepper` was chosen in order to account for pepper values being returned for other endpoints in the future. The contents of `lookup_pepper` MUST match the regular expression `[a-zA-Z0-9]*`. - The client should append the pepper to the end of the 3pid string before - hashing. +``` +The client should append the pepper to the end of the 3PID string before +hashing. - "email alice@example.com" -> "email alice@example.commatrixrocks" - "email bob@example.com" -> "email bob@example.commatrixrocks" - "email carl@example.com" -> "email carl@example.commatrixrocks" - "msisdn 12345678910" -> "msisdn 12345678910matrixrocks" - "email denny@example.com" -> "email denny@example.commatrixrocks" +"email alice@example.com" -> "email alice@example.commatrixrocks" +"email bob@example.com" -> "email bob@example.commatrixrocks" +"email carl@example.com" -> "email carl@example.commatrixrocks" +"msisdn 12345678910" -> "msisdn 12345678910matrixrocks" +"email denny@example.com" -> "email denny@example.commatrixrocks" +``` Clients SHOULD request this endpoint each time before performing a lookup, to handle identity servers which may rotate their pepper values frequently. Clients MUST choose one of the given hash algorithms to encrypt the 3PID during lookup. -Note that possible hashing algorithms will be defined in the Matrix -specification, and an Identity Server can choose to implement one or all of -them. Later versions of the specification may deprecate algorithms when -necessary. Currently the only listed hashing algorithm is SHA-256 as defined by -[RFC 4634](https://tools.ietf.org/html/rfc4634) and Identity Servers and -clients MUST agree to its use with the string `sha256`. SHA-256 was chosen as -it is currently used throughout the Matrix spec, as well as its properties of -being quick to hash. While this reduces the resources necessary to generate a -rainbow table for attackers, a fast hash is necessary if particularly slow -mobile clients are going to be hashing thousands of contact details. +At a minimum, clients and identity servers MUST support SHA-256 as defined by +[RFC 4634](https://tools.ietf.org/html/rfc4634), identified by the +`algorithm` value `"sha256"`. SHA-256 was chosen as it is currently used +throughout the Matrix spec, as well as its properties of being quick to hash. +While this reduces the resources necessary to generate a rainbow table for +attackers, a fast hash is necessary if particularly slow mobile clients are +going to be hashing thousands of contact details. Other algorithms can be +negotiated by the client and server at their discretion. When performing a lookup, the pepper and hashing algorithm the client used must be part of the request body. If they do not match what the server has on file @@ -112,20 +120,23 @@ server. If the algorithm does not match the server's, the server should return a `400 M_INVALID_PARAM`. If the pepper does not match the server's, the server should -return a new error code, 400 `M_INVALID_PEPPER`. A new error code is not +return a new error code, `400 M_INVALID_PEPPER`. A new error code is not defined for an invalid algorithm as that is considered a client bug. -Each of these error responses should contain the correct `algorithm` and -`lookup_pepper` fields. This is to prevent the client from needing to query -`/hash_details` again, thus saving a round-trip. An example response to an -incorrect pepper would be: - - { - "error": "Incorrect value for lookup_pepper", - "errcode": "M_INVALID_PEPPER", - "algorithm": "sha256", - "lookup_pepper": "matrixrocks" - } +The `M_INVALID_PEPPER` error response should contain the correct `algorithm` +and `lookup_pepper` fields. This is to prevent the client from needing to +query `/hash_details` again, thus saving a round-trip. `M_INVALID_PARAM` does +not include these fields. An example response to an incorrect pepper would +be: + +``` +{ + "error": "Incorrect value for lookup_pepper", + "errcode": "M_INVALID_PEPPER", + "algorithm": "sha256", + "lookup_pepper": "matrixrocks" +} +``` Now comes time for the lookup. Note that the resulting hash digest MUST be encoded in URL-safe unpadded base64 (similar to [room version 4's event @@ -133,36 +144,40 @@ IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)). Once hashing has been performed using the defined hashing algorithm, the client sends each hash in an array. - "email alice@example.commatrixrocks" -> "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs" - "email bob@example.commatrixrocks" -> "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE" - "email carl@example.commatrixrocks" -> "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw" - "msisdn 12345678910matrixrocks" -> "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens" - "email denny@example.commatrixrocks" -> "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" - - POST /_matrix/identity/v2/lookup - - { - "hashes": [ - "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs", - "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE", - "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw", - "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens", - "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" - ], - "algorithm": "sha256", - "pepper": "matrixrocks" - } +``` +"email alice@example.commatrixrocks" -> "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs" +"email bob@example.commatrixrocks" -> "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE" +"email carl@example.commatrixrocks" -> "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw" +"msisdn 12345678910matrixrocks" -> "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens" +"email denny@example.commatrixrocks" -> "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" + +POST /_matrix/identity/v2/lookup + +{ + "hashes": [ + "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs", + "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE", + "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw", + "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens", + "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" + ], + "algorithm": "sha256", + "pepper": "matrixrocks" +} +``` The identity server, upon receiving these hashes, can simply compare against the hashes of the 3PIDs it stores. The server then responds with the Matrix IDs of those that match: - { - "mappings": { - "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs": "@alice:example.com", - "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens": "@fred:example.com" - } - } +``` +{ + "mappings": { + "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs": "@alice:example.com", + "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens": "@fred:example.com" + } +} +``` The client can now display which 3PIDs link to which Matrix IDs. @@ -173,7 +188,7 @@ as part of this proposal. ## Fallback considerations `v1` versions of these endpoints may be disabled at the discretion of the -implementation, and should return a 403 `M_FORBIDDEN` error if so. +implementation, and should return a `403 M_FORBIDDEN` error if so. If an identity server is too old and a HTTP 400 or 404 is received when accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. @@ -186,11 +201,13 @@ be clear about where they are being sent to. * There is a small cost incurred by performing hashes before requests, but this is outweighed by the privacy implications of sending plain-text addresses. -## Potential issues +## Security Considerations -Hashes are still reversible with a rainbow table, but hopefully the provided -pepper, which can be rotated by identity servers at will, should help mitigate -this to some extent. +Hashes are still reversible with a rainbow table, but the provided pepper, +which can be rotated by identity servers at will, should help mitigate this. +Phone numbers (with their relatively short possible address space of 12 +numbers), short email addresses, and addresses of both type that have been +leaked in database dumps are more susceptible to hash reversal. Additionally, this proposal does not stop an identity server from storing plain-text 3PIDs. There is a GDPR argument in keeping email addresses, such From 1963a24832eeb4539fbcdc7449cc0bded4bfff13 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 8 Jul 2019 13:27:38 +0100 Subject: [PATCH 38/67] fix attacks paragraph --- proposals/2134-identity-hash-lookup.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index f1df605f..2ac074af 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -64,9 +64,12 @@ hashed). First the client must prepend the medium to the address: "denny@example.com" -> "email denny@example.com" ``` -Hashes must be peppered in order to reduce both the information a client gains -during the process, and attacks the identity server can perform (namely sending -a rainbow table of hashes back in the response to `/lookup`). +Hashes must be peppered in order to reduce both the information an identity +server gains during the process, and attacks the client can perform. Clients +will have to generate a full rainbow table specific to the set pepper to +obtain all registered MXIDs, while the server has to generate a full rainbow +table with the specific pepper to get the plaintext 3pids for non-matrix +users. In order for clients to know the pepper and hashing algorithm they should use, Identity servers must make the information available on the `/hash_details` From ed67e26037650b2781661bbe78f3593209778c5c Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 8 Jul 2019 17:02:33 +0100 Subject: [PATCH 39/67] pepper must not be an empty string, append medium --- proposals/2134-identity-hash-lookup.md | 45 +++++++++++++++----------- 1 file changed, 26 insertions(+), 19 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 2ac074af..18ecece7 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -54,14 +54,14 @@ The client will hash each 3PID as a concatenation of the medium and address, separated by a space and a pepper appended to the end. Note that phone numbers should be formatted as defined by https://matrix.org/docs/spec/appendices#pstn-phone-numbers, before being -hashed). First the client must prepend the medium to the address: +hashed). First the client must append the medium to the address: ``` -"alice@example.com" -> "email alice@example.com" -"bob@example.com" -> "email bob@example.com" -"carl@example.com" -> "email carl@example.com" -"+1 234 567 8910" -> "msisdn 12345678910" -"denny@example.com" -> "email denny@example.com" +"alice@example.com" -> "alice@example.com email" +"bob@example.com" -> "bob@example.com email" +"carl@example.com" -> "carl@example.com email" +"+1 234 567 8910" -> "12345678910 msisdn" +"denny@example.com" -> "denny@example.com email" ``` Hashes must be peppered in order to reduce both the information an identity @@ -84,19 +84,20 @@ GET /_matrix/identity/v2/hash_details } ``` -The name `lookup_pepper` was chosen in order to account for pepper values being -returned for other endpoints in the future. The contents of `lookup_pepper` -MUST match the regular expression `[a-zA-Z0-9]*`. +The name `lookup_pepper` was chosen in order to account for pepper values +being returned for other endpoints in the future. The contents of +`lookup_pepper` MUST match the regular expression `[a-zA-Z0-9]+`. If +`lookup_pepper` is an empty string, clients MUST cease the lookup operation. ``` The client should append the pepper to the end of the 3PID string before hashing. -"email alice@example.com" -> "email alice@example.commatrixrocks" -"email bob@example.com" -> "email bob@example.commatrixrocks" -"email carl@example.com" -> "email carl@example.commatrixrocks" -"msisdn 12345678910" -> "msisdn 12345678910matrixrocks" -"email denny@example.com" -> "email denny@example.commatrixrocks" +"alice@example.com email" -> "alice@example.com emailmatrixrocks" +"bob@example.com email" -> "bob@example.com emailmatrixrocks" +"carl@example.com email" -> "carl@example.com emailmatrixrocks" +"12345678910 msdisn" -> "12345678910 msisdnmatrixrocks" +"denny@example.com email" -> "denny@example.com emailmatrixrocks" ``` Clients SHOULD request this endpoint each time before performing a lookup, to @@ -148,11 +149,13 @@ performed using the defined hashing algorithm, the client sends each hash in an array. ``` -"email alice@example.commatrixrocks" -> "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs" -"email bob@example.commatrixrocks" -> "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE" -"email carl@example.commatrixrocks" -> "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw" -"msisdn 12345678910matrixrocks" -> "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens" -"email denny@example.commatrixrocks" -> "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" +NOTE: Hashes are not real values + +"alice@example.com emailmatrixrocks" -> "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs" +"bob@example.com emailmatrixrocks" -> "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE" +"carl@example.com emailmatrixrocks" -> "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw" +"12345678910 msisdnmatrixrocks" -> "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens" +"denny@example.com emailmatrixrocks" -> "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" POST /_matrix/identity/v2/lookup @@ -212,6 +215,10 @@ Phone numbers (with their relatively short possible address space of 12 numbers), short email addresses, and addresses of both type that have been leaked in database dumps are more susceptible to hash reversal. +Mediums and peppers are appended to the address as to prevent a common prefix +for each plain-text string, which prevents attackers from pre-computing bits +of a stream cipher. + Additionally, this proposal does not stop an identity server from storing plain-text 3PIDs. There is a GDPR argument in keeping email addresses, such that if a breach happens, users must be notified of such. Ideally this would be From 3514437d24399462fef62b9c32e15a57eefe13fd Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Fri, 12 Jul 2019 11:37:38 +0100 Subject: [PATCH 40/67] Ability for client/server to decide on no hashing --- proposals/2134-identity-hash-lookup.md | 31 +++++++++++++++++--------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 18ecece7..3f869f0d 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -86,8 +86,9 @@ GET /_matrix/identity/v2/hash_details The name `lookup_pepper` was chosen in order to account for pepper values being returned for other endpoints in the future. The contents of -`lookup_pepper` MUST match the regular expression `[a-zA-Z0-9]+`. If -`lookup_pepper` is an empty string, clients MUST cease the lookup operation. +`lookup_pepper` MUST match the regular expression `[a-zA-Z0-9]+` (unless no +hashing is being performed, as described below). If `lookup_pepper` is an +empty string, clients MUST cease the lookup operation. ``` The client should append the pepper to the end of the 3PID string before @@ -102,8 +103,8 @@ hashing. Clients SHOULD request this endpoint each time before performing a lookup, to handle identity servers which may rotate their pepper values frequently. -Clients MUST choose one of the given hash algorithms to encrypt the 3PID during -lookup. +Clients MUST choose one of the given hash algorithms to encrypt the 3PID +during lookup. At a minimum, clients and identity servers MUST support SHA-256 as defined by [RFC 4634](https://tools.ietf.org/html/rfc4634), identified by the @@ -114,13 +115,21 @@ attackers, a fast hash is necessary if particularly slow mobile clients are going to be hashing thousands of contact details. Other algorithms can be negotiated by the client and server at their discretion. -When performing a lookup, the pepper and hashing algorithm the client used must -be part of the request body. If they do not match what the server has on file -(which may be the case if the pepper was changed right after the client's -request for it), then the server must inform the client that they need to query -the hash details again, instead of just returning an empty response, which -clients would assume to mean that no contacts are registered on that identity -server. +There are certain situations when an identity server cannot be expected to +compare hashed 3PID values; When a server is connected to a backend provider +such as LDAP, there is no way for the identity server to efficiently pull all +of the addresses and hash them. For this case, the `algorithm` field of `GET +/hash_details` may be set to `"none"`, and `lookup_pepper` will be an empty +string. No hashing will be performed if the client and server decide on this, +and 3PIDs will be sent in plain-text, similar to the v1 `/lookup` API. + +When performing a lookup, the pepper and hashing algorithm the client used +must be part of the request body (even when using the `"none"` algorithm +value). If they do not match what the server has on file (which may be the +case if the pepper was changed right after the client's request for it), then +the server must inform the client that they need to query the hash details +again, instead of just returning an empty response, which clients would +assume to mean that no contacts are registered on that identity server. If the algorithm does not match the server's, the server should return a `400 M_INVALID_PARAM`. If the pepper does not match the server's, the server should From 36cb8ed894895a6bd1e7ab3755f6adde6ec77d21 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 16 Jul 2019 10:44:02 +0100 Subject: [PATCH 41/67] none -> m.none --- proposals/2134-identity-hash-lookup.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 3f869f0d..09e4748a 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -119,12 +119,12 @@ There are certain situations when an identity server cannot be expected to compare hashed 3PID values; When a server is connected to a backend provider such as LDAP, there is no way for the identity server to efficiently pull all of the addresses and hash them. For this case, the `algorithm` field of `GET -/hash_details` may be set to `"none"`, and `lookup_pepper` will be an empty +/hash_details` may be set to `"m.none"`, and `lookup_pepper` will be an empty string. No hashing will be performed if the client and server decide on this, and 3PIDs will be sent in plain-text, similar to the v1 `/lookup` API. When performing a lookup, the pepper and hashing algorithm the client used -must be part of the request body (even when using the `"none"` algorithm +must be part of the request body (even when using the `"m.none"` algorithm value). If they do not match what the server has on file (which may be the case if the pepper was changed right after the client's request for it), then the server must inform the client that they need to query the hash details From 0444c8016b469540717ec2054799f3d80eb6a700 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 22 Jul 2019 15:33:49 +0100 Subject: [PATCH 42/67] review comments --- proposals/2134-identity-hash-lookup.md | 52 +++++++++++++++----------- 1 file changed, 30 insertions(+), 22 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 09e4748a..3fc92b53 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -65,11 +65,7 @@ hashed). First the client must append the medium to the address: ``` Hashes must be peppered in order to reduce both the information an identity -server gains during the process, and attacks the client can perform. Clients -will have to generate a full rainbow table specific to the set pepper to -obtain all registered MXIDs, while the server has to generate a full rainbow -table with the specific pepper to get the plaintext 3pids for non-matrix -users. +server gains during the process, and attacks the client can perform. [0] In order for clients to know the pepper and hashing algorithm they should use, Identity servers must make the information available on the `/hash_details` @@ -87,13 +83,14 @@ GET /_matrix/identity/v2/hash_details The name `lookup_pepper` was chosen in order to account for pepper values being returned for other endpoints in the future. The contents of `lookup_pepper` MUST match the regular expression `[a-zA-Z0-9]+` (unless no -hashing is being performed, as described below). If `lookup_pepper` is an -empty string, clients MUST cease the lookup operation. +hashing is being performed, as described below). If hashing is being +performed, and `lookup_pepper` is an empty string, clients MUST cease the +lookup operation. -``` The client should append the pepper to the end of the 3PID string before hashing. +``` "alice@example.com email" -> "alice@example.com emailmatrixrocks" "bob@example.com email" -> "bob@example.com emailmatrixrocks" "carl@example.com email" -> "carl@example.com emailmatrixrocks" @@ -106,22 +103,26 @@ handle identity servers which may rotate their pepper values frequently. Clients MUST choose one of the given hash algorithms to encrypt the 3PID during lookup. -At a minimum, clients and identity servers MUST support SHA-256 as defined by -[RFC 4634](https://tools.ietf.org/html/rfc4634), identified by the -`algorithm` value `"sha256"`. SHA-256 was chosen as it is currently used -throughout the Matrix spec, as well as its properties of being quick to hash. -While this reduces the resources necessary to generate a rainbow table for -attackers, a fast hash is necessary if particularly slow mobile clients are -going to be hashing thousands of contact details. Other algorithms can be -negotiated by the client and server at their discretion. +Clients and identity servers MUST support SHA-256 as defined by [RFC +4634](https://tools.ietf.org/html/rfc4634), identified by the `algorithm` +value `"sha256"`. SHA-256 was chosen as it is currently used throughout the +Matrix spec, as well as its properties of being quick to hash. While this +reduces the resources necessary to generate a rainbow table for attackers, a +fast hash is necessary if particularly slow mobile clients are going to be +hashing thousands of contact details. Other algorithms can be negotiated by +the client and server at their discretion. There are certain situations when an identity server cannot be expected to -compare hashed 3PID values; When a server is connected to a backend provider -such as LDAP, there is no way for the identity server to efficiently pull all -of the addresses and hash them. For this case, the `algorithm` field of `GET -/hash_details` may be set to `"m.none"`, and `lookup_pepper` will be an empty -string. No hashing will be performed if the client and server decide on this, -and 3PIDs will be sent in plain-text, similar to the v1 `/lookup` API. +compare hashed 3PID values; for example, when a server is connected to a +backend provider such as LDAP, there is no way for the identity server to +efficiently pull all of the addresses and hash them. For this case, clients +and server MUST also support sending plain-text 3PID values. To agree upon +this, the `algorithm` field of `GET /hash_details` MUST be set to `"m.none"`, +whereas `lookup_pepper` will be an empty string. No hashing will be performed +if the client and server decide on this, and 3PIDs will be sent in +plain-text, similar to the v1 `/lookup` API. When this occurs, it is STRONGLY +RECOMMENDED for the client to prompt the user before continuing, and receive +consent for sending 3PID details in plain-text to the identity server. When performing a lookup, the pepper and hashing algorithm the client used must be part of the request body (even when using the `"m.none"` algorithm @@ -277,3 +278,10 @@ This proposal outlines a simple method to stop bulk collection of user's contact lists and their social graphs without any disastrous side effects. All functionality which depends on the lookup service should continue to function unhindered by the use of hashes. + +## Footnotes + +[0] Clients would have to generate a full rainbow table specific to the set +pepper to obtain all registered MXIDs, while the server would have to +generate a full rainbow table with the specific pepper to get the plaintext +3pids for non-matrix users. From 887cd5e7d056000bc05cee6b28d7d1bf595edfaa Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Mon, 22 Jul 2019 16:00:29 +0100 Subject: [PATCH 43/67] I really hope someone doesn't invest none-hash --- proposals/2134-identity-hash-lookup.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 3fc92b53..9a5cee11 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -117,7 +117,7 @@ compare hashed 3PID values; for example, when a server is connected to a backend provider such as LDAP, there is no way for the identity server to efficiently pull all of the addresses and hash them. For this case, clients and server MUST also support sending plain-text 3PID values. To agree upon -this, the `algorithm` field of `GET /hash_details` MUST be set to `"m.none"`, +this, the `algorithm` field of `GET /hash_details` MUST be set to `"none"`, whereas `lookup_pepper` will be an empty string. No hashing will be performed if the client and server decide on this, and 3PIDs will be sent in plain-text, similar to the v1 `/lookup` API. When this occurs, it is STRONGLY @@ -125,7 +125,7 @@ RECOMMENDED for the client to prompt the user before continuing, and receive consent for sending 3PID details in plain-text to the identity server. When performing a lookup, the pepper and hashing algorithm the client used -must be part of the request body (even when using the `"m.none"` algorithm +must be part of the request body (even when using the `"none"` algorithm value). If they do not match what the server has on file (which may be the case if the pepper was changed right after the client's request for it), then the server must inform the client that they need to query the hash details From 577021f12b9edfd3ca0b428fa3774aa540a8b632 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 23 Jul 2019 11:48:01 +0100 Subject: [PATCH 44/67] resolve some comments --- proposals/2134-identity-hash-lookup.md | 90 ++++++++++++-------------- 1 file changed, 42 insertions(+), 48 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 9a5cee11..8db758dc 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -1,8 +1,8 @@ # MSC2134: Identity Hash Lookups [Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been -recently created in response to a security issue brought up by an independent -party. To summarise the issue, lookups (of Matrix user IDs) are performed using +created in response to a security issue brought up by an independent party. +To summarise the issue, lookups (of Matrix user IDs) are performed using plain-text 3PIDs (third-party IDs) which means that the identity server can identify and record every 3PID that the user has in their contacts, whether that email address or phone number is already known by the identity server or @@ -26,10 +26,10 @@ which will leak less data to identity servers. ## Proposal This proposal suggests making changes to the Identity Service API's lookup -endpoints. Instead, this proposal consolidates them into a single `/lookup` -endpoint. Additionally, the endpoint is to be on a `v2` path, to avoid -confusion with the original `/lookup`. We also drop the `/api` in order to -preserve consistency across other endpoints: +endpoints, consolidating them into a single `/lookup` endpoint. The endpoint +is to be on a `v2` path, to avoid confusion with the original `v1` `/lookup`. +The `/api` part is also dropped in order to preserve consistency across other +endpoints: - `/_matrix/identity/v2/lookup` @@ -68,7 +68,7 @@ Hashes must be peppered in order to reduce both the information an identity server gains during the process, and attacks the client can perform. [0] In order for clients to know the pepper and hashing algorithm they should use, -Identity servers must make the information available on the `/hash_details` +identity servers must make the information available on the `/hash_details` endpoint: ``` @@ -104,25 +104,30 @@ Clients MUST choose one of the given hash algorithms to encrypt the 3PID during lookup. Clients and identity servers MUST support SHA-256 as defined by [RFC -4634](https://tools.ietf.org/html/rfc4634), identified by the `algorithm` -value `"sha256"`. SHA-256 was chosen as it is currently used throughout the -Matrix spec, as well as its properties of being quick to hash. While this -reduces the resources necessary to generate a rainbow table for attackers, a -fast hash is necessary if particularly slow mobile clients are going to be -hashing thousands of contact details. Other algorithms can be negotiated by -the client and server at their discretion. +4634](https://tools.ietf.org/html/rfc4634), identified by the value +`"sha256"` in the `algorithms` array. SHA-256 was chosen as it is currently +used throughout the Matrix spec, as well as its properties of being quick to +hash. While this reduces the resources necessary to generate a rainbow table +for attackers, a fast hash is necessary if particularly slow mobile clients +are going to be hashing thousands of contact details. Other algorithms are +negotiated by the client and server at their discretion. There are certain situations when an identity server cannot be expected to compare hashed 3PID values; for example, when a server is connected to a backend provider such as LDAP, there is no way for the identity server to efficiently pull all of the addresses and hash them. For this case, clients and server MUST also support sending plain-text 3PID values. To agree upon -this, the `algorithm` field of `GET /hash_details` MUST be set to `"none"`, -whereas `lookup_pepper` will be an empty string. No hashing will be performed -if the client and server decide on this, and 3PIDs will be sent in -plain-text, similar to the v1 `/lookup` API. When this occurs, it is STRONGLY -RECOMMENDED for the client to prompt the user before continuing, and receive -consent for sending 3PID details in plain-text to the identity server. +this, the `"algorithms"` field of `GET /hash_details` MUST contain the value +`"none"`, and `lookup_pepper` will be an empty string. For this case, the +identity server could only send `"none"` as part of the `"algorithms"` array. +The client can then decide whether it wants to accept this. The identity +server could also send `["none", "sha256"]` and cease from looking up +contacts in LDAP unless `"none"` is decided upon. + +No hashing will be performed if the client and server decide on `"none"`, and +3PIDs will be sent in plain-text, similar to the v1 `/lookup` API. When this +occurs, it is STRONGLY RECOMMENDED for the client to prompt the user before +continuing. When performing a lookup, the pepper and hashing algorithm the client used must be part of the request body (even when using the `"none"` algorithm @@ -132,16 +137,15 @@ the server must inform the client that they need to query the hash details again, instead of just returning an empty response, which clients would assume to mean that no contacts are registered on that identity server. -If the algorithm does not match the server's, the server should return a `400 +If the algorithm is not supported by the server, the server should return a `400 M_INVALID_PARAM`. If the pepper does not match the server's, the server should return a new error code, `400 M_INVALID_PEPPER`. A new error code is not defined for an invalid algorithm as that is considered a client bug. -The `M_INVALID_PEPPER` error response should contain the correct `algorithm` -and `lookup_pepper` fields. This is to prevent the client from needing to -query `/hash_details` again, thus saving a round-trip. `M_INVALID_PARAM` does -not include these fields. An example response to an incorrect pepper would -be: +The `M_INVALID_PEPPER` error response contain the correct `algorithm` and +`lookup_pepper` fields. This is to prevent the client from needing to query +`/hash_details` again, thus saving a request. `M_INVALID_PARAM` does not +include these fields. An example response to an incorrect pepper would be: ``` { @@ -207,10 +211,9 @@ as part of this proposal. implementation, and should return a `403 M_FORBIDDEN` error if so. If an identity server is too old and a HTTP 400 or 404 is received when -accessing the `v2` endpoint, they should fallback to the `v1` endpoint instead. -However, clients should be aware that plain-text 3PIDs are required for the -`v1` endpoint, and SHOULD ask for user consent to send 3PIDs in plain-text, and -be clear about where they are being sent to. +accessing the `v2` endpoint, clients should fallback to the `v1` endpoint +instead. However, clients should be aware that plain-text 3PIDs are required +for the `v1` endpoints, and are strongly encouraged to warn the user of this. ## Tradeoffs @@ -229,14 +232,6 @@ Mediums and peppers are appended to the address as to prevent a common prefix for each plain-text string, which prevents attackers from pre-computing bits of a stream cipher. -Additionally, this proposal does not stop an identity server from storing -plain-text 3PIDs. There is a GDPR argument in keeping email addresses, such -that if a breach happens, users must be notified of such. Ideally this would be -done over Matrix, but people may've stuck their email in an identity server and -then left Matrix forever. Perhaps if only hashes were being stored on the -identity server then that isn't considered personal information? In any case, a -discussion for another MSC. - ## Other considered solutions Ideally identity servers would never receive plain-text addresses, however it @@ -251,16 +246,15 @@ eventual solution of using Software Guard Extensions (detailed in https://signal.org/blog/private-contact-discovery/) is considered impractical for a federated network, as it requires specialized hardware. -k-anonymity was considered as an alternative, in which the identity server -would never receive a full hash of a 3PID that it did not already know about. -While this has been considered plausible, it comes with heightened resource -requirements (much more hashing by the identity server). The conclusion was -that it may not provide more privacy if an identity server decided to be evil, -however it would significantly raise the resource requirements to run an evil -identity server. - -Discussion and a walk-through of what a client/identity-server interaction would -look like are documented [in this Github +k-anonymity was considered as an alternative approach, in which the identity +server would never receive a full hash of a 3PID that it did not already know +about. While this has been considered plausible, it comes with heightened +resource requirements (much more hashing by the identity server). The +conclusion was that it may not provide more privacy if an identity server +decided to be evil, however it would significantly raise the resource +requirements to run an evil identity server. Discussion and a walk-through of +what a client/identity-server interaction would look like are documented [in +this Github comment](https://github.com/matrix-org/matrix-doc/pull/2134#discussion_r298691748). Additionally, a radical model was also considered where the first portion of From b26a9ed1fd7f847a701028efcba55f6aff82d1c3 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 23 Jul 2019 13:28:42 +0100 Subject: [PATCH 45/67] Expand on why we can't trust dirty homeservers --- proposals/2134-identity-hash-lookup.md | 30 ++++++++++++++++---------- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 8db758dc..5bd4889a 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -234,10 +234,6 @@ of a stream cipher. ## Other considered solutions -Ideally identity servers would never receive plain-text addresses, however it -is necessary for the identity server to send email/sms messages during a -bind, as it cannot trust a homeserver to do so as the homeserver may be lying. - Bloom filters are an alternative method of providing private contact discovery. However, they do not scale well due to requiring clients to download a large filter that needs updating every time a new bind is made. Further considered @@ -257,15 +253,27 @@ what a client/identity-server interaction would look like are documented [in this Github comment](https://github.com/matrix-org/matrix-doc/pull/2134#discussion_r298691748). -Additionally, a radical model was also considered where the first portion of -the above scheme was done with an identity server, and the second would be done -with various homeservers who originally reported the 3PID to the identity -server. While interesting and a more decentralised model, some attacks are -still possible if the identity server is running an evil homeserver which it -can direct the client to send its hashes to. Discussion on this matter has -taken place in the MSC-specific room [starting at this +A radical model was also considered where the first portion of the +k-anonyminity scheme was done with an identity server, and the second would +be done with various homeservers who originally reported the 3PID to the +identity server. While interesting and a more decentralised model, some +attacks are still possible if the identity server is running an evil +homeserver which it can direct the client to send its hashes to. Discussion +on this matter has taken place in the MSC-specific room [starting at this message](https://matrix.to/#/!LlraCeVuFgMaxvRySN:amorgan.xyz/$4wzTSsspbLVa6Lx5cBq6toh6P3TY3YnoxALZuO8n9gk?via=amorgan.xyz&via=matrix.org&via=matrix.vgorcum.com). +Ideally identity servers would never receive plain-text addresses, just +storing and receiving hash values instead. However, it is necessary for the +identity server to have plain-text addresses during a +[bind](https://matrix.org/docs/spec/identity_service/r0.2.1#post-matrix-identity-api-v1-3pid-bind) +call, in order to send a verification email or sms message. It is not +feasible to defer this job to a homeserver, as the identity server cannot +trust that the homeserver has actually performed verification. Thus it may +not be possible to prevent plain-text 3PIDs of registered Matrix users from +being sent to the identity server at least once. Yet, we can still do our +best by coming up with creative ways to prevent non-matrix user 3PIDs from +leaking to the identity server, when they're sent in a lookup. + ## Conclusion This proposal outlines a simple method to stop bulk collection of user's From 9fd6bd318461cba6226fffdbc00f2fac6b839036 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 23 Jul 2019 15:16:27 +0100 Subject: [PATCH 46/67] Add details about why this proposal should exist --- proposals/2134-identity-hash-lookup.md | 51 ++++++++++++++++++-------- 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 5bd4889a..23c155f4 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -6,22 +6,41 @@ To summarise the issue, lookups (of Matrix user IDs) are performed using plain-text 3PIDs (third-party IDs) which means that the identity server can identify and record every 3PID that the user has in their contacts, whether that email address or phone number is already known by the identity server or -not. - -If the 3PID is hashed, the identity server could not determine the address -unless it has already seen that address in plain-text during a previous call -of the [/bind -mechanism](https://matrix.org/docs/spec/identity_service/r0.2.1#post-matrix-identity-api-v1-3pid-bind) -(without significant resources to reverse the hashes). This helps prevent -bulk collection of user's contact lists by the identity server and reduces -its ability to build social graphs. - -This proposal thus calls for the Identity Service API's -[/lookup](https://matrix.org/docs/spec/identity_service/r0.2.1#get-matrix-identity-api-v1-lookup) -endpoint to use hashed 3PIDs instead of their plain-text counterparts (and to -deprecate both it and -[/bulk_lookup](https://matrix.org/docs/spec/identity_service/r0.2.1#post-matrix-identity-api-v1-bulk-lookup)), -which will leak less data to identity servers. +not. In the latter case, an identity server is able to collect email +addresses and phone numbers that have a high probability of being connected +to a real person. It could then use this data for marketing or other +purposes. + +However, if the email addresses and phone numbers are hashed before they are +sent to the identity server, the server would have a more difficult time of +being able to recover the original addresses. This prevents contact +information of non-Matrix users being exposed by the lookup service. + +However, hashing is not perfect. While reversing a hash is not possible, it +is possible to build a [rainbow +table](https://en.wikipedia.org/wiki/Rainbow_table), which could map many +known email addresses and phone numbers to their hash equivalents. When the +identity server receives a hash, it would then be able to look it up in this +table, and find the email address or phone number associated with it. In an +ideal world, one would use a hashing algorithm such as +[bcrypt](https://en.wikipedia.org/wiki/Bcrypt), with many rounds, which would +make building such a rainbow table an extraordinarily expensive process. +Unfortunately, this is impractical for our use case, as it would require +clients to perform many, many rounds of hashing, linearly dependent on their +address book size, which would likely result in lower-end mobile phones +becoming overwhelmed. Thus, we must use a fast hashing algorithm, at the cost +of making rainbow tables easy to build. + +The rainbow table attack is not perfect. While there are only so many +possible phone numbers, and thus it is simple to generate the hash value for +each one, the address space of email addresses is much, much wider. Therefore +if your email address is decently long and is not publicly known to +attackers, it is unlikely that it would be included in a rainbow table. + +Thus the approach of hashing, while adding complexity to implementation and +minor resource consumption of the client and identity server, does provide +added difficultly for the identity server to carry out contact detail +harvesting, which should be considered worthwhile. ## Proposal From 3031df79cc67419c40851afe6eea6985cc18c843 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 23 Jul 2019 16:33:24 +0100 Subject: [PATCH 47/67] Add example for none algo --- proposals/2134-identity-hash-lookup.md | 163 +++++++++++++++++-------- 1 file changed, 110 insertions(+), 53 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 23c155f4..d6cb0506 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -2,45 +2,46 @@ [Issue #2130](https://github.com/matrix-org/matrix-doc/issues/2130) has been created in response to a security issue brought up by an independent party. -To summarise the issue, lookups (of Matrix user IDs) are performed using -plain-text 3PIDs (third-party IDs) which means that the identity server can -identify and record every 3PID that the user has in their contacts, whether -that email address or phone number is already known by the identity server or -not. In the latter case, an identity server is able to collect email -addresses and phone numbers that have a high probability of being connected -to a real person. It could then use this data for marketing or other -purposes. - -However, if the email addresses and phone numbers are hashed before they are +To summarise the issue, when a user wants to ask an identity server which of +its contacts have registered a Matrix account, it performs a lookup against +an identity server. The client currently sends all of its contact details in +the form of plain-text addresses, meaning that the identity server can +identify and record every third-party ID (3PID) of the user's contacts. This +allows the identity server is able to collect email addresses and phone +numbers that have a high probability of being connected to a real person. +This data could then be used for marketing, political campaigns, etc. + +However, if these email addresses and phone numbers are hashed before they are sent to the identity server, the server would have a more difficult time of being able to recover the original addresses. This prevents contact -information of non-Matrix users being exposed by the lookup service. - -However, hashing is not perfect. While reversing a hash is not possible, it -is possible to build a [rainbow -table](https://en.wikipedia.org/wiki/Rainbow_table), which could map many -known email addresses and phone numbers to their hash equivalents. When the -identity server receives a hash, it would then be able to look it up in this -table, and find the email address or phone number associated with it. In an -ideal world, one would use a hashing algorithm such as -[bcrypt](https://en.wikipedia.org/wiki/Bcrypt), with many rounds, which would -make building such a rainbow table an extraordinarily expensive process. -Unfortunately, this is impractical for our use case, as it would require -clients to perform many, many rounds of hashing, linearly dependent on their -address book size, which would likely result in lower-end mobile phones -becoming overwhelmed. Thus, we must use a fast hashing algorithm, at the cost -of making rainbow tables easy to build. - -The rainbow table attack is not perfect. While there are only so many -possible phone numbers, and thus it is simple to generate the hash value for -each one, the address space of email addresses is much, much wider. Therefore -if your email address is decently long and is not publicly known to -attackers, it is unlikely that it would be included in a rainbow table. +information of non-Matrix users being exposed to the lookup service. + +Yet, hashing is not perfect. While reversing a hash is not possible, it is +possible to build a [rainbow +table](https://en.wikipedia.org/wiki/Rainbow_table), which maps known email +addresses and phone numbers to their hash equivalents. When the identity +server receives a hash, it is then be able to look it up in its rainbow table +and find the corresponding 3PID. To prevent this, one would use a hashing +algorithm such as [bcrypt](https://en.wikipedia.org/wiki/Bcrypt) with many +rounds, making the construction of a large rainbow table an infeasibly +expensive process. Unfortunately, this is impractical for our use case, as it +would require clients to also perform many, many rounds of hashing, linearly +dependent on the size of their address book, which would likely result in +lower-end mobile phones becoming overwhelmed. We are then forced to use a +fast hashing algorithm, at the cost of making rainbow tables easy to build. + +The rainbow table attack is not perfect, because one does need to know email +addresses and phone numbers to build it. While there are only so many +possible phone numbers, and thus it is relatively inexpensive to generate the +hash value for each one, the address space of email addresses is much, much +wider. If your email address is decently long and is not publicly +known to attackers, it is unlikely that it would be included in a rainbow +table. Thus the approach of hashing, while adding complexity to implementation and -minor resource consumption of the client and identity server, does provide -added difficultly for the identity server to carry out contact detail -harvesting, which should be considered worthwhile. +resource consumption of the client and identity server, does provide added +difficulty for the identity server to carry out contact detail harvesting, +which should be considered worthwhile. ## Proposal @@ -106,8 +107,7 @@ hashing is being performed, as described below). If hashing is being performed, and `lookup_pepper` is an empty string, clients MUST cease the lookup operation. -The client should append the pepper to the end of the 3PID string before -hashing. +If hashing, the client should append the pepper to the end of the 3PID string. ``` "alice@example.com email" -> "alice@example.com emailmatrixrocks" @@ -119,8 +119,8 @@ hashing. Clients SHOULD request this endpoint each time before performing a lookup, to handle identity servers which may rotate their pepper values frequently. -Clients MUST choose one of the given hash algorithms to encrypt the 3PID -during lookup. +Clients MUST choose one of the given `algorithms` values to encrypt the +3PID during lookup. Clients and identity servers MUST support SHA-256 as defined by [RFC 4634](https://tools.ietf.org/html/rfc4634), identified by the value @@ -133,15 +133,11 @@ negotiated by the client and server at their discretion. There are certain situations when an identity server cannot be expected to compare hashed 3PID values; for example, when a server is connected to a -backend provider such as LDAP, there is no way for the identity server to -efficiently pull all of the addresses and hash them. For this case, clients +backend provider such as LDAP, it is not efficient for the identity server to +pull all of the addresses and hash them on lookup. For this case, clients and server MUST also support sending plain-text 3PID values. To agree upon this, the `"algorithms"` field of `GET /hash_details` MUST contain the value -`"none"`, and `lookup_pepper` will be an empty string. For this case, the -identity server could only send `"none"` as part of the `"algorithms"` array. -The client can then decide whether it wants to accept this. The identity -server could also send `["none", "sha256"]` and cease from looking up -contacts in LDAP unless `"none"` is decided upon. +`"none"`. No hashing will be performed if the client and server decide on `"none"`, and 3PIDs will be sent in plain-text, similar to the v1 `/lookup` API. When this @@ -153,7 +149,7 @@ must be part of the request body (even when using the `"none"` algorithm value). If they do not match what the server has on file (which may be the case if the pepper was changed right after the client's request for it), then the server must inform the client that they need to query the hash details -again, instead of just returning an empty response, which clients would +again, as opposed to just returning an empty response, which clients would assume to mean that no contacts are registered on that identity server. If the algorithm is not supported by the server, the server should return a `400 @@ -175,11 +171,11 @@ include these fields. An example response to an incorrect pepper would be: } ``` -Now comes time for the lookup. Note that the resulting hash digest MUST be -encoded in URL-safe unpadded base64 (similar to [room version 4's event +Now comes time for the lookup. We'll first cover an example of the client +choosing the `"sha256"` algorithm. Note that the resulting hash digest MUST +be encoded in URL-safe unpadded base64 (similar to [room version 4's event IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)). Once hashing has been -performed using the defined hashing algorithm, the client sends each hash in an -array. +performed, the client sends each hash in an array. ``` NOTE: Hashes are not real values @@ -193,7 +189,7 @@ NOTE: Hashes are not real values POST /_matrix/identity/v2/lookup { - "hashes": [ + "addresses": [ "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs", "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE", "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw", @@ -206,7 +202,7 @@ POST /_matrix/identity/v2/lookup ``` The identity server, upon receiving these hashes, can simply compare against -the hashes of the 3PIDs it stores. The server then responds with the Matrix +the hashes of the 3PIDs it stores. The server then responds with the Matrix IDs of those that match: ``` @@ -220,6 +216,67 @@ IDs of those that match: The client can now display which 3PIDs link to which Matrix IDs. +For the case of the identity server sending, and the client choosing, +`"none"` as the algorithm, we would do the following. + +The client would first make `GET` a request to `/hash_details`, perhaps +receiving the response: + +``` +{ + "lookup_pepper": "matrixrocks", + "algorithms": ["none", "sha256"] +} +``` + +The client decides that it would like to use `"none"`, and thus ignores the +lookup pepper, as no hashing will occur. Appending a space and the 3PID +medium to each address is still necessary: + +``` +"alice@example.com" -> "alice@example.com email" +"bob@example.com" -> "bob@example.com email" +"carl@example.com" -> "carl@example.com email" +"12345678910" -> "12345678910 msisdn" +"denny@example.com" -> "denny@example.com email" +``` + +The client then sends these off to the identity server in a `POST` request to +`/lookup`: + +``` +POST /_matrix/identity/v2/lookup + +{ + "addresses": [ + "alice@example.com email", + "bob@example.com email", + "carl@example.com email", + "12345678910 msisdn", + "denny@example.com email" + ], + "algorithm": "none", + "pepper": "matrixrocks" +} +``` + +Note that even though we haven't used the `lookup_pepper` value, we still +include the same one sent to us by the identity server in `/hash_details`. +The identity server should still return `400 M_INVALID_PEPPER` if the pepper +is incorrect. This is intended to make implementation simpler. + +Finally, the identity server will check its database for the Matrix user IDs +it has that correspond to these 3PID addresses, and returns them: + +``` +{ + "mappings": { + "alice@example.com email": "@alice:example.com", + "12345678910 msisdn": "@fred:example.com" + } +} +``` + No parameter changes will be made to [/bind](https://matrix.org/docs/spec/identity_service/r0.2.1#post-matrix-identity-api-v1-3pid-bind) as part of this proposal. From 3b8c57e06ca961c6842ab409b64f825b8884c573 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Tue, 23 Jul 2019 16:43:55 +0100 Subject: [PATCH 48/67] Don't require servers/clients to support "none" --- proposals/2134-identity-hash-lookup.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index d6cb0506..b40f9f28 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -134,10 +134,11 @@ negotiated by the client and server at their discretion. There are certain situations when an identity server cannot be expected to compare hashed 3PID values; for example, when a server is connected to a backend provider such as LDAP, it is not efficient for the identity server to -pull all of the addresses and hash them on lookup. For this case, clients -and server MUST also support sending plain-text 3PID values. To agree upon -this, the `"algorithms"` field of `GET /hash_details` MUST contain the value -`"none"`. +pull all of the addresses and hash them upon lookup. For this case, can also +support receiving plain-text 3PID addresses from clients. To agree upon this, +the value `"none"` can be added to the `"algorithms"` array of `GET +/hash_details`. The client can then choose to send plain-text values by +setting the `"algorithm"` value in `POST /lookup` to `"none"`. No hashing will be performed if the client and server decide on `"none"`, and 3PIDs will be sent in plain-text, similar to the v1 `/lookup` API. When this From 8f3e58870830569693afae86c42b44dc8cb30f0b Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Wed, 24 Jul 2019 15:27:48 +0100 Subject: [PATCH 49/67] pepper is not a secret val. Still needs to be around. --- proposals/2134-identity-hash-lookup.md | 28 +++++++++++++++----------- 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index b40f9f28..72bc4e53 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -7,9 +7,9 @@ its contacts have registered a Matrix account, it performs a lookup against an identity server. The client currently sends all of its contact details in the form of plain-text addresses, meaning that the identity server can identify and record every third-party ID (3PID) of the user's contacts. This -allows the identity server is able to collect email addresses and phone -numbers that have a high probability of being connected to a real person. -This data could then be used for marketing, political campaigns, etc. +allows the identity server to collect email addresses and phone numbers that +have a high probability of being connected to a real person. This data could +then be used for marketing, political campaigns, etc. However, if these email addresses and phone numbers are hashed before they are sent to the identity server, the server would have a more difficult time of @@ -71,10 +71,14 @@ denny@example.com ``` The client will hash each 3PID as a concatenation of the medium and address, -separated by a space and a pepper appended to the end. Note that phone numbers -should be formatted as defined by +separated by a space and a pepper appended to the end. Note that phone +numbers should be formatted as defined by https://matrix.org/docs/spec/appendices#pstn-phone-numbers, before being -hashed). First the client must append the medium to the address: +hashed). Note that "pepper" in this proposal simply refers to a public, +opaque string that is used to produce different hash results between identity +servers. Its value is not secret. + +First the client must append the medium to the address: ``` "alice@example.com" -> "alice@example.com email" @@ -102,12 +106,11 @@ GET /_matrix/identity/v2/hash_details The name `lookup_pepper` was chosen in order to account for pepper values being returned for other endpoints in the future. The contents of -`lookup_pepper` MUST match the regular expression `[a-zA-Z0-9]+` (unless no -hashing is being performed, as described below). If hashing is being -performed, and `lookup_pepper` is an empty string, clients MUST cease the -lookup operation. +`lookup_pepper` MUST match the regular expression `[a-zA-Z0-9]+`, whether +hashing is being performed or not. When no hashing is occuring, a pepper +value of at least length 1 is still required. -If hashing, the client should append the pepper to the end of the 3PID string. +If hashing, the client appends the pepper to the end of the 3PID string. ``` "alice@example.com email" -> "alice@example.com emailmatrixrocks" @@ -264,7 +267,8 @@ POST /_matrix/identity/v2/lookup Note that even though we haven't used the `lookup_pepper` value, we still include the same one sent to us by the identity server in `/hash_details`. The identity server should still return `400 M_INVALID_PEPPER` if the pepper -is incorrect. This is intended to make implementation simpler. +is incorrect. This simplifies things and can help ensure the client is +requesting `/hash_details` properly before each lookup request. Finally, the identity server will check its database for the Matrix user IDs it has that correspond to these 3PID addresses, and returns them: From c6dd5951a15cf1be34c8794ffd6ca2657f34af27 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Thu, 25 Jul 2019 18:53:32 +0100 Subject: [PATCH 50/67] Clients can cache the hash details if they want to --- proposals/2134-identity-hash-lookup.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 72bc4e53..83b0dceb 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -120,10 +120,10 @@ If hashing, the client appends the pepper to the end of the 3PID string. "denny@example.com email" -> "denny@example.com emailmatrixrocks" ``` -Clients SHOULD request this endpoint each time before performing a lookup, to -handle identity servers which may rotate their pepper values frequently. -Clients MUST choose one of the given `algorithms` values to encrypt the -3PID during lookup. +Clients can cache the result of this endpoint, but should re-request it +during an error on `/lookup`, to handle identity servers which may rotate +their pepper values frequently. Clients MUST choose one of the given +`algorithms` values to encrypt the 3PID during lookup. Clients and identity servers MUST support SHA-256 as defined by [RFC 4634](https://tools.ietf.org/html/rfc4634), identified by the value From da876bb340ddd1130bba69afc5b09a3ac40000da Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Thu, 25 Jul 2019 18:54:02 +0100 Subject: [PATCH 51/67] missing word --- proposals/2134-identity-hash-lookup.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 83b0dceb..0151808b 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -137,11 +137,11 @@ negotiated by the client and server at their discretion. There are certain situations when an identity server cannot be expected to compare hashed 3PID values; for example, when a server is connected to a backend provider such as LDAP, it is not efficient for the identity server to -pull all of the addresses and hash them upon lookup. For this case, can also -support receiving plain-text 3PID addresses from clients. To agree upon this, -the value `"none"` can be added to the `"algorithms"` array of `GET -/hash_details`. The client can then choose to send plain-text values by -setting the `"algorithm"` value in `POST /lookup` to `"none"`. +pull all of the addresses and hash them upon lookup. For this case, identity +servers can also support receiving plain-text 3PID addresses from clients. To +agree upon this, the value `"none"` can be added to the `"algorithms"` array +of `GET /hash_details`. The client can then choose to send plain-text values +by setting the `"algorithm"` value in `POST /lookup` to `"none"`. No hashing will be performed if the client and server decide on `"none"`, and 3PIDs will be sent in plain-text, similar to the v1 `/lookup` API. When this From 0ac70b268accded4d7f1c563eca7048ff95d1658 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Thu, 25 Jul 2019 18:55:57 +0100 Subject: [PATCH 52/67] Clarify peppering should not happen on none algo --- proposals/2134-identity-hash-lookup.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 0151808b..0eb996be 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -143,10 +143,10 @@ agree upon this, the value `"none"` can be added to the `"algorithms"` array of `GET /hash_details`. The client can then choose to send plain-text values by setting the `"algorithm"` value in `POST /lookup` to `"none"`. -No hashing will be performed if the client and server decide on `"none"`, and -3PIDs will be sent in plain-text, similar to the v1 `/lookup` API. When this -occurs, it is STRONGLY RECOMMENDED for the client to prompt the user before -continuing. +No hashing nor peppering will be performed if the client and server decide on +`"none"`, and 3PIDs will be sent in plain-text, similar to the v1 `/lookup` +API. When this occurs, it is STRONGLY RECOMMENDED for the client to prompt +the user before continuing. When performing a lookup, the pepper and hashing algorithm the client used must be part of the request body (even when using the `"none"` algorithm From 20c72a3649139889d2f86f872972e4a94182ea38 Mon Sep 17 00:00:00 2001 From: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> Date: Thu, 25 Jul 2019 18:56:17 +0100 Subject: [PATCH 53/67] Update proposals/2134-identity-hash-lookup.md Co-Authored-By: David Baker --- proposals/2134-identity-hash-lookup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 72bc4e53..1d797086 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -161,7 +161,7 @@ M_INVALID_PARAM`. If the pepper does not match the server's, the server should return a new error code, `400 M_INVALID_PEPPER`. A new error code is not defined for an invalid algorithm as that is considered a client bug. -The `M_INVALID_PEPPER` error response contain the correct `algorithm` and +The `M_INVALID_PEPPER` error response contains the correct `algorithm` and `lookup_pepper` fields. This is to prevent the client from needing to query `/hash_details` again, thus saving a request. `M_INVALID_PARAM` does not include these fields. An example response to an incorrect pepper would be: From 6119b9a50dea51c67790b7ac1916ad9d2f1df472 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Thu, 25 Jul 2019 19:03:43 +0100 Subject: [PATCH 54/67] *@hobnobbob.com is unlikely to be guessed --- proposals/2134-identity-hash-lookup.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 0eb996be..d6512c5a 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -34,9 +34,9 @@ The rainbow table attack is not perfect, because one does need to know email addresses and phone numbers to build it. While there are only so many possible phone numbers, and thus it is relatively inexpensive to generate the hash value for each one, the address space of email addresses is much, much -wider. If your email address is decently long and is not publicly -known to attackers, it is unlikely that it would be included in a rainbow -table. +wider. If your email address is not share a common mailserver, decently long +or is not publicly known to attackers, it is unlikely that it would be +included in a rainbow table. Thus the approach of hashing, while adding complexity to implementation and resource consumption of the client and identity server, does provide added @@ -306,8 +306,9 @@ for the `v1` endpoints, and are strongly encouraged to warn the user of this. Hashes are still reversible with a rainbow table, but the provided pepper, which can be rotated by identity servers at will, should help mitigate this. Phone numbers (with their relatively short possible address space of 12 -numbers), short email addresses, and addresses of both type that have been -leaked in database dumps are more susceptible to hash reversal. +numbers), short email addresses at popular domains, and addresses of both +type that have been leaked in database dumps are more susceptible to hash +reversal. Mediums and peppers are appended to the address as to prevent a common prefix for each plain-text string, which prevents attackers from pre-computing bits From ffbfde8a09cad00fa178e2dd1817b7cac1abf9e5 Mon Sep 17 00:00:00 2001 From: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> Date: Fri, 26 Jul 2019 11:40:20 +0100 Subject: [PATCH 55/67] Update proposals/2134-identity-hash-lookup.md Co-Authored-By: Hubert Chathi --- proposals/2134-identity-hash-lookup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 2abbc5a9..1872af69 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -34,7 +34,7 @@ The rainbow table attack is not perfect, because one does need to know email addresses and phone numbers to build it. While there are only so many possible phone numbers, and thus it is relatively inexpensive to generate the hash value for each one, the address space of email addresses is much, much -wider. If your email address is not share a common mailserver, decently long +wider. If your email address does not use a common mail server, is decently long or is not publicly known to attackers, it is unlikely that it would be included in a rainbow table. From 5580a2a1a9796d68eab2c71dbee3f67d31ebe1ad Mon Sep 17 00:00:00 2001 From: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> Date: Fri, 26 Jul 2019 11:40:38 +0100 Subject: [PATCH 56/67] Update proposals/2134-identity-hash-lookup.md Co-Authored-By: Hubert Chathi --- proposals/2134-identity-hash-lookup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 1872af69..1df58df1 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -70,7 +70,7 @@ carl@example.com denny@example.com ``` -The client will hash each 3PID as a concatenation of the medium and address, +The client will hash each 3PID as a concatenation of the address and medium, separated by a space and a pepper appended to the end. Note that phone numbers should be formatted as defined by https://matrix.org/docs/spec/appendices#pstn-phone-numbers, before being From a17c74f592bafebb7d4d4c8b49318e37ec0d8c92 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Fri, 26 Jul 2019 12:00:53 +0100 Subject: [PATCH 57/67] switch medium and address around, space between address and pepper --- proposals/2134-identity-hash-lookup.md | 63 +++++++++++++------------- 1 file changed, 32 insertions(+), 31 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 2abbc5a9..8e20d250 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -71,21 +71,21 @@ denny@example.com ``` The client will hash each 3PID as a concatenation of the medium and address, -separated by a space and a pepper appended to the end. Note that phone -numbers should be formatted as defined by +separated by a space and a pepper, also separated by a space, appended to the +end. Note that phone numbers should be formatted as defined by https://matrix.org/docs/spec/appendices#pstn-phone-numbers, before being hashed). Note that "pepper" in this proposal simply refers to a public, opaque string that is used to produce different hash results between identity servers. Its value is not secret. -First the client must append the medium to the address: +First the client must prepend the medium (plus a space) to the address: ``` -"alice@example.com" -> "alice@example.com email" -"bob@example.com" -> "bob@example.com email" -"carl@example.com" -> "carl@example.com email" -"+1 234 567 8910" -> "12345678910 msisdn" -"denny@example.com" -> "denny@example.com email" +"alice@example.com" -> "email alice@example.com" +"bob@example.com" -> "email bob@example.com" +"carl@example.com" -> "email carl@example.com" +"+1 234 567 8910" -> "msisdn 12345678910" +"denny@example.com" -> "email denny@example.com" ``` Hashes must be peppered in order to reduce both the information an identity @@ -110,14 +110,15 @@ being returned for other endpoints in the future. The contents of hashing is being performed or not. When no hashing is occuring, a pepper value of at least length 1 is still required. -If hashing, the client appends the pepper to the end of the 3PID string. +If hashing, the client appends the pepper to the end of the 3PID string, +after a space. ``` -"alice@example.com email" -> "alice@example.com emailmatrixrocks" -"bob@example.com email" -> "bob@example.com emailmatrixrocks" -"carl@example.com email" -> "carl@example.com emailmatrixrocks" -"12345678910 msdisn" -> "12345678910 msisdnmatrixrocks" -"denny@example.com email" -> "denny@example.com emailmatrixrocks" +"alice@example.com email" -> "email alice@example.com matrixrocks" +"bob@example.com email" -> "email bob@example.com matrixrocks" +"carl@example.com email" -> "email carl@example.com matrixrocks" +"12345678910 msdisn" -> "msisdn 12345678910 matrixrocks" +"denny@example.com email" -> "email denny@example.com matrixrocks" ``` Clients can cache the result of this endpoint, but should re-request it @@ -184,11 +185,11 @@ performed, the client sends each hash in an array. ``` NOTE: Hashes are not real values -"alice@example.com emailmatrixrocks" -> "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs" -"bob@example.com emailmatrixrocks" -> "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE" -"carl@example.com emailmatrixrocks" -> "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw" -"12345678910 msisdnmatrixrocks" -> "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens" -"denny@example.com emailmatrixrocks" -> "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" +"email alice@example.com matrixrocks" -> "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs" +"email bob@example.com matrixrocks" -> "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE" +"email carl@example.com matrixrocks" -> "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw" +"msisdn 12345678910 matrixrocks" -> "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens" +"email denny@example.com matrixrocks" -> "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" POST /_matrix/identity/v2/lookup @@ -238,11 +239,11 @@ lookup pepper, as no hashing will occur. Appending a space and the 3PID medium to each address is still necessary: ``` -"alice@example.com" -> "alice@example.com email" -"bob@example.com" -> "bob@example.com email" -"carl@example.com" -> "carl@example.com email" -"12345678910" -> "12345678910 msisdn" -"denny@example.com" -> "denny@example.com email" +"alice@example.com" -> "email alice@example.com" +"bob@example.com" -> "email bob@example.com" +"carl@example.com" -> "email carl@example.com" +"+1 234 567 8910" -> "msisdn 12345678910" +"denny@example.com" -> "email denny@example.com" ``` The client then sends these off to the identity server in a `POST` request to @@ -253,11 +254,11 @@ POST /_matrix/identity/v2/lookup { "addresses": [ - "alice@example.com email", - "bob@example.com email", - "carl@example.com email", - "12345678910 msisdn", - "denny@example.com email" + "email alice@example.com", + "email bob@example.com", + "email carl@example.com", + "msisdn 12345678910", + "email denny@example.com" ], "algorithm": "none", "pepper": "matrixrocks" @@ -276,8 +277,8 @@ it has that correspond to these 3PID addresses, and returns them: ``` { "mappings": { - "alice@example.com email": "@alice:example.com", - "12345678910 msisdn": "@fred:example.com" + "email alice@example.com": "@alice:example.com", + "msisdn 12345678910": "@fred:example.com" } } ``` From 6660768d85f2f8eac2ec9efa2613884e159d6576 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Fri, 26 Jul 2019 12:04:17 +0100 Subject: [PATCH 58/67] Don't repeat fast hash bit --- proposals/2134-identity-hash-lookup.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index de11221a..7b3edbc5 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -130,10 +130,7 @@ Clients and identity servers MUST support SHA-256 as defined by [RFC 4634](https://tools.ietf.org/html/rfc4634), identified by the value `"sha256"` in the `algorithms` array. SHA-256 was chosen as it is currently used throughout the Matrix spec, as well as its properties of being quick to -hash. While this reduces the resources necessary to generate a rainbow table -for attackers, a fast hash is necessary if particularly slow mobile clients -are going to be hashing thousands of contact details. Other algorithms are -negotiated by the client and server at their discretion. +hash. There are certain situations when an identity server cannot be expected to compare hashed 3PID values; for example, when a server is connected to a From 4d1f2ea4f40d3fedce718f42c4151fb970fe9aa4 Mon Sep 17 00:00:00 2001 From: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> Date: Fri, 26 Jul 2019 12:05:41 +0100 Subject: [PATCH 59/67] Apply suggestions from code review Co-Authored-By: Hubert Chathi --- proposals/2134-identity-hash-lookup.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index de11221a..13cea82e 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -124,7 +124,7 @@ after a space. Clients can cache the result of this endpoint, but should re-request it during an error on `/lookup`, to handle identity servers which may rotate their pepper values frequently. Clients MUST choose one of the given -`algorithms` values to encrypt the 3PID during lookup. +`algorithms` values to hash the 3PID during lookup. Clients and identity servers MUST support SHA-256 as defined by [RFC 4634](https://tools.ietf.org/html/rfc4634), identified by the value @@ -160,7 +160,7 @@ assume to mean that no contacts are registered on that identity server. If the algorithm is not supported by the server, the server should return a `400 M_INVALID_PARAM`. If the pepper does not match the server's, the server should return a new error code, `400 M_INVALID_PEPPER`. A new error code is not -defined for an invalid algorithm as that is considered a client bug. +defined for an unsupported algorithm as that is considered a client bug. The `M_INVALID_PEPPER` error response contains the correct `algorithm` and `lookup_pepper` fields. This is to prevent the client from needing to query @@ -308,7 +308,7 @@ Hashes are still reversible with a rainbow table, but the provided pepper, which can be rotated by identity servers at will, should help mitigate this. Phone numbers (with their relatively short possible address space of 12 numbers), short email addresses at popular domains, and addresses of both -type that have been leaked in database dumps are more susceptible to hash +types that have been leaked in database dumps are more susceptible to hash reversal. Mediums and peppers are appended to the address as to prevent a common prefix From 57de107ea914924fb02fc98b607b214b7a1bc3b6 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Wed, 31 Jul 2019 11:07:22 +0100 Subject: [PATCH 60/67] Move medium back behind the address --- proposals/2134-identity-hash-lookup.md | 56 +++++++++++++------------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 5f500b6c..06a8486a 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -78,14 +78,14 @@ hashed). Note that "pepper" in this proposal simply refers to a public, opaque string that is used to produce different hash results between identity servers. Its value is not secret. -First the client must prepend the medium (plus a space) to the address: +First the client must append the medium (plus a space) to the address: ``` -"alice@example.com" -> "email alice@example.com" -"bob@example.com" -> "email bob@example.com" -"carl@example.com" -> "email carl@example.com" -"+1 234 567 8910" -> "msisdn 12345678910" -"denny@example.com" -> "email denny@example.com" +"alice@example.com" -> "alice@example.com email" +"bob@example.com" -> "bob@example.com email" +"carl@example.com" -> "carl@example.com email" +"+1 234 567 8910" -> "12345678910 msisdn" +"denny@example.com" -> "denny@example.com email" ``` Hashes must be peppered in order to reduce both the information an identity @@ -114,11 +114,11 @@ If hashing, the client appends the pepper to the end of the 3PID string, after a space. ``` -"alice@example.com email" -> "email alice@example.com matrixrocks" -"bob@example.com email" -> "email bob@example.com matrixrocks" -"carl@example.com email" -> "email carl@example.com matrixrocks" -"12345678910 msdisn" -> "msisdn 12345678910 matrixrocks" -"denny@example.com email" -> "email denny@example.com matrixrocks" +"alice@example.com email" -> "alice@example.com email matrixrocks" +"bob@example.com email" -> "bob@example.com email matrixrocks" +"carl@example.com email" -> "carl@example.com email matrixrocks" +"12345678910 msdisn" -> "12345678910 msisdn matrixrocks" +"denny@example.com email" -> "denny@example.com email matrixrocks" ``` Clients can cache the result of this endpoint, but should re-request it @@ -182,11 +182,11 @@ performed, the client sends each hash in an array. ``` NOTE: Hashes are not real values -"email alice@example.com matrixrocks" -> "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs" -"email bob@example.com matrixrocks" -> "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE" -"email carl@example.com matrixrocks" -> "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw" -"msisdn 12345678910 matrixrocks" -> "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens" -"email denny@example.com matrixrocks" -> "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" +"alice@example.com email matrixrocks" -> "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs" +"bob@example.com email matrixrocks" -> "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE" +"carl@example.com email matrixrocks" -> "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw" +"12345678910 msisdn matrixrocks" -> "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens" +"denny@example.com email matrixrocks" -> "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" POST /_matrix/identity/v2/lookup @@ -236,11 +236,11 @@ lookup pepper, as no hashing will occur. Appending a space and the 3PID medium to each address is still necessary: ``` -"alice@example.com" -> "email alice@example.com" -"bob@example.com" -> "email bob@example.com" -"carl@example.com" -> "email carl@example.com" -"+1 234 567 8910" -> "msisdn 12345678910" -"denny@example.com" -> "email denny@example.com" +"alice@example.com" -> "alice@example.com email" +"bob@example.com" -> "bob@example.com email" +"carl@example.com" -> "carl@example.com email" +"+1 234 567 8910" -> "12345678910 msisdn" +"denny@example.com" -> "denny@example.com email" ``` The client then sends these off to the identity server in a `POST` request to @@ -251,11 +251,11 @@ POST /_matrix/identity/v2/lookup { "addresses": [ - "email alice@example.com", - "email bob@example.com", - "email carl@example.com", - "msisdn 12345678910", - "email denny@example.com" + "alice@example.com email", + "bob@example.com email", + "carl@example.com email", + "12345678910 msisdn", + "denny@example.com email" ], "algorithm": "none", "pepper": "matrixrocks" @@ -274,8 +274,8 @@ it has that correspond to these 3PID addresses, and returns them: ``` { "mappings": { - "email alice@example.com": "@alice:example.com", - "msisdn 12345678910": "@fred:example.com" + "alice@example.com email": "@alice:example.com", + "12345678910 msisdn": "@fred:example.com" } } ``` From 9913f5bc298328f0daff702bdbd9ec892fe8c582 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Wed, 31 Jul 2019 11:16:58 +0100 Subject: [PATCH 61/67] Slightly clarify pepper value --- proposals/2134-identity-hash-lookup.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 06a8486a..28c78918 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -107,8 +107,8 @@ GET /_matrix/identity/v2/hash_details The name `lookup_pepper` was chosen in order to account for pepper values being returned for other endpoints in the future. The contents of `lookup_pepper` MUST match the regular expression `[a-zA-Z0-9]+`, whether -hashing is being performed or not. When no hashing is occuring, a pepper -value of at least length 1 is still required. +hashing is being performed or not. When no hashing is occuring, a valid +pepper value of at least length 1 is still required. If hashing, the client appends the pepper to the end of the 3PID string, after a space. From 33d22c3320892c57e1e83724a67a4d6b716edf8f Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Wed, 31 Jul 2019 11:47:03 +0100 Subject: [PATCH 62/67] hashes are not stream ciphers --- proposals/2134-identity-hash-lookup.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 28c78918..e156d158 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -309,8 +309,8 @@ types that have been leaked in database dumps are more susceptible to hash reversal. Mediums and peppers are appended to the address as to prevent a common prefix -for each plain-text string, which prevents attackers from pre-computing bits -of a stream cipher. +for each plain-text string, which prevents attackers from pre-computing the +internal state of the hash function ## Other considered solutions From 3789d828fd0d767d93a6c801fe8b1c95c32c6de2 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Thu, 1 Aug 2019 14:51:26 +0100 Subject: [PATCH 63/67] Incorporate solution analysis from the context of attacks --- proposals/2134-identity-hash-lookup.md | 125 +++++++++++++++++++++---- 1 file changed, 105 insertions(+), 20 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 5f500b6c..0881d1cb 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -316,43 +316,128 @@ of a stream cipher. Bloom filters are an alternative method of providing private contact discovery. However, they do not scale well due to requiring clients to download a large -filter that needs updating every time a new bind is made. Further considered -solutions are explored in https://signal.org/blog/contact-discovery/. Signal's -eventual solution of using Software Guard Extensions (detailed in +filter that needs updating every time a new bind is made. + +Further considered solutions are explored in +https://signal.org/blog/contact-discovery/. Signal's eventual solution of +using Software Guard Extensions (detailed in https://signal.org/blog/private-contact-discovery/) is considered impractical for a federated network, as it requires specialized hardware. k-anonymity was considered as an alternative approach, in which the identity server would never receive a full hash of a 3PID that it did not already know -about. While this has been considered plausible, it comes with heightened -resource requirements (much more hashing by the identity server). The -conclusion was that it may not provide more privacy if an identity server -decided to be evil, however it would significantly raise the resource -requirements to run an evil identity server. Discussion and a walk-through of -what a client/identity-server interaction would look like are documented [in -this Github +about. Discussion and a walk-through of what a client/identity-server +interaction would look like are documented [in this Github comment](https://github.com/matrix-org/matrix-doc/pull/2134#discussion_r298691748). +While this solution seems like a win for privacy, its actual benefits are a +lot more naunced. Let's explore them by performing threat-model analysis: + +We consider three attackers: + + 1. A malicious third party trying to discover the identity server mappings + in the homeserver. + + The malicious third party scenario can only be protected against by rate + limiting lookups, given otherwise it looks identical to legitimate traffic. + + 1. An attacker who has stolen an IS db + + In theory the 3PIDs could be stored hashed with a static salt to protect + a stolen DB. This has been descoped from this MSC, and is largely an + orthogonal problem. + + 1. A compromised or malicious identity server, who may be trying to + determine the contents of a user's addressbook (including non-Matrix users) + +Our approaches for protecting against a malicious identity server are: + + * We resign ourselves to the IS knowing the 3PIDs at point of bind, as + otherwise it can't validate them. + + * To protect the 3PIDs of non-Matrix users: + + 1. We could hash the uploaded 3PIDs with a static pepper; however, a + malicious IS could pre-generate a rainbow table to reverse these hashes. + + 1. We could hash the uploaded 3PIDs with a slowly rotating pepper; a + malicious IS could generate a rainbow table in retrospect to reverse these + hashes (but wouldn't be able to reuse the table) + + 1. We could send partial hashes of the uploaded 3PIDs (with full salted + hashes to disambiguate the 3PIDs), have the IS respond with anonymised + partial results, to allow the IS to avoid reversing the 3PIDs (a + k-anonymity approach). However, the IS could still claim to have mappings + for all 3PIDs, and so receive all the salted hashes, and be able to + reverse them via rainbow tables for that salt. + +So, in terms of computational complexity for the attacker, respectively: + + 1. The attacker has to generate a rainbow table over all possible IDs once, + which can then be reused for subsequent attacks. + + 1. The attacker has to generate a rainbow table over all possible IDs for a + given lookup timeframe, which cannot be reused for subsequent attacks. + + 1. The attacker has to generate multiple but partial rainbow tables, one + per group of 3PIDs that share similar hash prefixes, which cannot then be + reused for any other attack. + +For making life hardest for an attacker, option 3 (k-anon) wins. However, it +also makes things harder for the client and server: + + * The client has to calculate new salted hashes for all 3PIDs every time it + uploads. + + * The server has to calculate new salted hashes for all partially-matching + 3PIDs hashes as it looks them up. + +It's worth noting that one could always just go and load up a malicious IS DB +with a huge pre-image set of mappings and thus see what uploaded 3PIDs match, +no matter what algorithm is used. + +For k-anon this would put the most computational onus on the server (as it +would effectively be creating a partial rainbow table for every lookup), but +this is probably not infeasible - so we've gone and added a lot of complexity +and computational cost for not much benefit, given the system can still be +trivially attacked. + +Finally, as more and more users come onto Matrix, their contact lists will +get more and more exposed anyway given the IS server has to be able to +identity Matrix-enabled 3PIDs to perform the lookup. + +Thus the conclusion is that while k-anon is harder to attack, it's unclear +that this is actually enough of an obstacle to meaningfully stop a malicious +IS. Therefore we should KISS and go for a simple hash lookup with a rotating +pepper (which is not much harder than a static pepper, especially if our +initial implementation doesn't bother rotating the pepper). Rather than trying +to make the k-anon approach work, we'd be better off spending that time +figuring out how to store 3pids as hashes in the DB (and in 3pid bindings +etc), or how to decentralise ISes in general. + A radical model was also considered where the first portion of the k-anonyminity scheme was done with an identity server, and the second would be done with various homeservers who originally reported the 3PID to the -identity server. While interesting and a more decentralised model, some -attacks are still possible if the identity server is running an evil -homeserver which it can direct the client to send its hashes to. Discussion -on this matter has taken place in the MSC-specific room [starting at this +identity server. While interesting and more decentralised, some attacks are +still possible if the identity server is running an evil homeserver which it +can direct the client to send its hashes to. Discussion on this matter has +taken place in the MSC-specific room [starting at this message](https://matrix.to/#/!LlraCeVuFgMaxvRySN:amorgan.xyz/$4wzTSsspbLVa6Lx5cBq6toh6P3TY3YnoxALZuO8n9gk?via=amorgan.xyz&via=matrix.org&via=matrix.vgorcum.com). -Ideally identity servers would never receive plain-text addresses, just -storing and receiving hash values instead. However, it is necessary for the -identity server to have plain-text addresses during a +Tangentially, identity servers would ideally just never receive plain-text +addresses, just storing and receiving hash values instead. However, it is +necessary for the identity server to have plain-text addresses during a [bind](https://matrix.org/docs/spec/identity_service/r0.2.1#post-matrix-identity-api-v1-3pid-bind) call, in order to send a verification email or sms message. It is not feasible to defer this job to a homeserver, as the identity server cannot trust that the homeserver has actually performed verification. Thus it may not be possible to prevent plain-text 3PIDs of registered Matrix users from -being sent to the identity server at least once. Yet, we can still do our -best by coming up with creative ways to prevent non-matrix user 3PIDs from -leaking to the identity server, when they're sent in a lookup. +being sent to the identity server at least once. Yet, it is possible that with +a few changes to other Identity Service endpoints, as described in [this +review +comment](https://github.com/matrix-org/matrix-doc/pull/2134#discussion_r309617900), +identity servers could refrain from storing any plaintext 3PIDs at rest. This +however, is a topic for a future MSC. ## Conclusion From c401a4d47b7f1edf63ed3aa5c2341f8554b824b1 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Thu, 1 Aug 2019 14:53:41 +0100 Subject: [PATCH 64/67] punctuation --- proposals/2134-identity-hash-lookup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index f4a5e36c..fd17475b 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -310,7 +310,7 @@ reversal. Mediums and peppers are appended to the address as to prevent a common prefix for each plain-text string, which prevents attackers from pre-computing the -internal state of the hash function +internal state of the hash function. ## Other considered solutions From 387772477497362f6b2714ee8ce1a6c2a6a9132b Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Thu, 1 Aug 2019 15:01:05 +0100 Subject: [PATCH 65/67] fix speeling --- proposals/2134-identity-hash-lookup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index fd17475b..11ea3258 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -331,7 +331,7 @@ interaction would look like are documented [in this Github comment](https://github.com/matrix-org/matrix-doc/pull/2134#discussion_r298691748). While this solution seems like a win for privacy, its actual benefits are a -lot more naunced. Let's explore them by performing threat-model analysis: +lot more nuanced. Let's explore them by performing a threat-model analysis: We consider three attackers: From 96e06b6f5f94c6d2f13ec1c50bca4e16469ed797 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Thu, 1 Aug 2019 15:04:38 +0100 Subject: [PATCH 66/67] Add line, britishise --- proposals/2134-identity-hash-lookup.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 11ea3258..6c3bbe63 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -322,7 +322,7 @@ Further considered solutions are explored in https://signal.org/blog/contact-discovery/. Signal's eventual solution of using Software Guard Extensions (detailed in https://signal.org/blog/private-contact-discovery/) is considered impractical -for a federated network, as it requires specialized hardware. +for a federated network, as it requires specialised hardware. k-anonymity was considered as an alternative approach, in which the identity server would never receive a full hash of a 3PID that it did not already know @@ -410,10 +410,12 @@ Thus the conclusion is that while k-anon is harder to attack, it's unclear that this is actually enough of an obstacle to meaningfully stop a malicious IS. Therefore we should KISS and go for a simple hash lookup with a rotating pepper (which is not much harder than a static pepper, especially if our -initial implementation doesn't bother rotating the pepper). Rather than trying -to make the k-anon approach work, we'd be better off spending that time -figuring out how to store 3pids as hashes in the DB (and in 3pid bindings -etc), or how to decentralise ISes in general. +initial implementation doesn't bother rotating the pepper). Rather than +trying to make the k-anon approach work, we'd be better off spending that +time figuring out how to store 3pids as hashes in the DB (and in 3pid +bindings etc), or how to decentralise ISes in general. It's also worth noting +that a malicious server may fail to rotate the pepper, making the rotation +logic of questionable benefit. A radical model was also considered where the first portion of the k-anonyminity scheme was done with an identity server, and the second would From 3edf5e3c16989a603d9f9e7996acdac5f6596ab3 Mon Sep 17 00:00:00 2001 From: Andrew Morgan Date: Fri, 2 Aug 2019 11:25:28 +0100 Subject: [PATCH 67/67] Make hashes real values --- proposals/2134-identity-hash-lookup.md | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/proposals/2134-identity-hash-lookup.md b/proposals/2134-identity-hash-lookup.md index 6c3bbe63..2fdda034 100644 --- a/proposals/2134-identity-hash-lookup.md +++ b/proposals/2134-identity-hash-lookup.md @@ -180,23 +180,21 @@ IDs](https://matrix.org/docs/spec/rooms/v4#event-ids)). Once hashing has been performed, the client sends each hash in an array. ``` -NOTE: Hashes are not real values - -"alice@example.com email matrixrocks" -> "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs" -"bob@example.com email matrixrocks" -> "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE" -"carl@example.com email matrixrocks" -> "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw" -"12345678910 msisdn matrixrocks" -> "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens" -"denny@example.com email matrixrocks" -> "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" +"alice@example.com email matrixrocks" -> "4kenr7N9drpCJ4AfalmlGQVsOn3o2RHjkADUpXJWZUc" +"bob@example.com email matrixrocks" -> "LJwSazmv46n0hlMlsb_iYxI0_HXEqy_yj6Jm636cdT8" +"carl@example.com email matrixrocks" -> "jDh2YLwYJg3vg9pEn3kaaXAP9jx-LlcotoH51Zgb9MA" +"12345678910 msisdn matrixrocks" -> "S11EvvwnUWBDZtI4MTRKgVuiRx76Z9HnkbyRlWkBqJs" +"denny@example.com email matrixrocks" -> "2tZto1arl2fUYtF6tQPJND69il3xke9OBlgFgnUt2ww" POST /_matrix/identity/v2/lookup { "addresses": [ - "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs", - "r0-6x3rp9zIWS2suIque-wXTnlv9sc41fatbRMEOwQE", - "ryr10d1K8fcFVxALb3egiSquqvFAxQEwegXtlHoQFBw", - "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens", - "bxt8rtRaOzMkSk49zIKE_NfqTndHvGbWHchZskW3xmY" + "4kenr7N9drpCJ4AfalmlGQVsOn3o2RHjkADUpXJWZUc", + "LJwSazmv46n0hlMlsb_iYxI0_HXEqy_yj6Jm636cdT8", + "jDh2YLwYJg3vg9pEn3kaaXAP9jx-LlcotoH51Zgb9MA", + "S11EvvwnUWBDZtI4MTRKgVuiRx76Z9HnkbyRlWkBqJs", + "2tZto1arl2fUYtF6tQPJND69il3xke9OBlgFgnUt2ww" ], "algorithm": "sha256", "pepper": "matrixrocks" @@ -210,8 +208,8 @@ IDs of those that match: ``` { "mappings": { - "y_TvXLKxFT9CURPXI1wvfjvfvsXe8FPgYj-mkQrnszs": "@alice:example.com", - "c_30UaSZhl5tyanIjFoE1IXTmuU3vmptEwVOc3P2Ens": "@fred:example.com" + "4kenr7N9drpCJ4AfalmlGQVsOn3o2RHjkADUpXJWZUc": "@alice:example.com", + "S11EvvwnUWBDZtI4MTRKgVuiRx76Z9HnkbyRlWkBqJs": "@fred:example.com" } } ```