Incorporate solution analysis from the context of attacks

6 years ago · 3789d828fd
parent c8527b7af8
commit 3789d828fd
1 changed files with 105 additions and 20 deletions
--- a/proposals/2134-identity-hash-lookup.md
+++ b/proposals/2134-identity-hash-lookup.md
@ -316,43 +316,128 @@ of a stream cipher.
 Bloom filters are an alternative method of providing private contact discovery.
 However, they do not scale well due to requiring clients to download a large
-filter that needs updating every time a new bind is made. Further considered
+filter that needs updating every time a new bind is made.
-solutions are explored in https://signal.org/blog/contact-discovery/. Signal's
+
-eventual solution of using Software Guard Extensions (detailed in
+Further considered solutions are explored in
 https://signal.org/blog/contact-discovery/. Signal's eventual solution of
 using Software Guard Extensions (detailed in
 https://signal.org/blog/private-contact-discovery/) is considered impractical
 for a federated network, as it requires specialized hardware.
 k-anonymity was considered as an alternative approach, in which the identity
 server would never receive a full hash of a 3PID that it did not already know
-about. While this has been considered plausible, it comes with heightened
+about. Discussion and a walk-through of what a client/identity-server
-resource requirements (much more hashing by the identity server). The
+interaction would look like are documented [in this Github
 conclusion was that it may not provide more privacy if an identity server
 decided to be evil, however it would significantly raise the resource
 requirements to run an evil identity server. Discussion and a walk-through of
 what a client/identity-server interaction would look like are documented [in
 this Github
 comment](https://github.com/matrix-org/matrix-doc/pull/2134#discussion_r298691748).
 While this solution seems like a win for privacy, its actual benefits are a
 lot more naunced. Let's explore them by performing threat-model analysis:
 We consider three attackers:
 1. A malicious third party trying to discover the identity server mappings
    in the homeserver.
    The malicious third party scenario can only be protected against by rate
    limiting lookups, given otherwise it looks identical to legitimate traffic.
 1. An attacker who has stolen an IS db
    In theory the 3PIDs could be stored hashed with a static salt to protect
    a stolen DB. This has been descoped from this MSC, and is largely an
    orthogonal problem.
 1. A compromised or malicious identity server, who may be trying to
    determine the contents of a user's addressbook (including non-Matrix users)
 Our approaches for protecting against a malicious identity server are:
 * We resign ourselves to the IS knowing the 3PIDs at point of bind, as
   otherwise it can't validate them.
 * To protect the 3PIDs of non-Matrix users:
   1. We could hash the uploaded 3PIDs with a static pepper; however, a
      malicious IS could pre-generate a rainbow table to reverse these hashes.
   1. We could hash the uploaded 3PIDs with a slowly rotating pepper; a
      malicious IS could generate a rainbow table in retrospect to reverse these
      hashes (but wouldn't be able to reuse the table)
   1. We could send partial hashes of the uploaded 3PIDs (with full salted
      hashes to disambiguate the 3PIDs), have the IS respond with anonymised
      partial results, to allow the IS to avoid reversing the 3PIDs (a
      k-anonymity approach). However, the IS could still claim to have mappings
      for all 3PIDs, and so receive all the salted hashes, and be able to
      reverse them via rainbow tables for that salt.
 So, in terms of computational complexity for the attacker, respectively:
  1. The attacker has to generate a rainbow table over all possible IDs once,
     which can then be reused for subsequent attacks.
  1. The attacker has to generate a rainbow table over all possible IDs for a
     given lookup timeframe, which cannot be reused for subsequent attacks.
  1. The attacker has to generate multiple but partial rainbow tables, one
     per group of 3PIDs that share similar hash prefixes, which cannot then be
     reused for any other attack.
 For making life hardest for an attacker, option 3 (k-anon) wins. However, it
 also makes things harder for the client and server:
 * The client has to calculate new salted hashes for all 3PIDs every time it
   uploads.
 * The server has to calculate new salted hashes for all partially-matching
   3PIDs hashes as it looks them up.
 It's worth noting that one could always just go and load up a malicious IS DB
 with a huge pre-image set of mappings and thus see what uploaded 3PIDs match,
 no matter what algorithm is used.
 For k-anon this would put the most computational onus on the server (as it
 would effectively be creating a partial rainbow table for every lookup), but
 this is probably not infeasible - so we've gone and added a lot of complexity
 and computational cost for not much benefit, given the system can still be
 trivially attacked.
 Finally, as more and more users come onto Matrix, their contact lists will
 get more and more exposed anyway given the IS server has to be able to
 identity Matrix-enabled 3PIDs to perform the lookup.
 Thus the conclusion is that while k-anon is harder to attack, it's unclear
 that this is actually enough of an obstacle to meaningfully stop a malicious
 IS. Therefore we should KISS and go for a simple hash lookup with a rotating
 pepper (which is not much harder than a static pepper, especially if our
 initial implementation doesn't bother rotating the pepper). Rather than trying
 to make the k-anon approach work, we'd be better off spending that time
 figuring out how to store 3pids as hashes in the DB (and in 3pid bindings
 etc), or how to decentralise ISes in general.
 A radical model was also considered where the first portion of the
 k-anonyminity scheme was done with an identity server, and the second would
 be done with various homeservers who originally reported the 3PID to the
-identity server. While interesting and a more decentralised model, some
+identity server. While interesting and more decentralised, some attacks are
-attacks are still possible if the identity server is running an evil
+still possible if the identity server is running an evil homeserver which it
-homeserver which it can direct the client to send its hashes to. Discussion
+can direct the client to send its hashes to. Discussion on this matter has
-on this matter has taken place in the MSC-specific room [starting at this
+taken place in the MSC-specific room [starting at this
 message](https://matrix.to/#/!LlraCeVuFgMaxvRySN:amorgan.xyz/$4wzTSsspbLVa6Lx5cBq6toh6P3TY3YnoxALZuO8n9gk?via=amorgan.xyz&via=matrix.org&via=matrix.vgorcum.com).
-Ideally identity servers would never receive plain-text addresses, just
+Tangentially, identity servers would ideally just never receive plain-text
-storing and receiving hash values instead. However, it is necessary for the
+addresses, just storing and receiving hash values instead. However, it is
-identity server to have plain-text addresses during a
+necessary for the identity server to have plain-text addresses during a
 [bind](https://matrix.org/docs/spec/identity_service/r0.2.1#post-matrix-identity-api-v1-3pid-bind)
 call, in order to send a verification email or sms message. It is not
 feasible to defer this job to a homeserver, as the identity server cannot
 trust that the homeserver has actually performed verification. Thus it may
 not be possible to prevent plain-text 3PIDs of registered Matrix users from
-being sent to the identity server at least once. Yet, we can still do our
+being sent to the identity server at least once. Yet, it is possible that with
-best by coming up with creative ways to prevent non-matrix user 3PIDs from
+a few changes to other Identity Service endpoints, as described in [this
-leaking to the identity server, when they're sent in a lookup.
+review
 comment](https://github.com/matrix-org/matrix-doc/pull/2134#discussion_r309617900),
 identity servers could refrain from storing any plaintext 3PIDs at rest. This
 however, is a topic for a future MSC.
 ## Conclusion