Merge pull request #2265 from matrix-org/babolivier/msc_email_case
MSC2265: Proposal for mandating case folding when processing e-mail address localpartspull/2622/head
commit
34f2d482be
@ -0,0 +1,101 @@
|
||||
# Proposal for mandating case folding when processing e-mail addresses
|
||||
|
||||
[RFC822](https://tools.ietf.org/html/rfc822#section-3.4.7) mandates that
|
||||
localparts in e-mail addresses must be processed with the original case
|
||||
preserved. [The Matrix spec](https://matrix.org/docs/spec/appendices#pid-types)
|
||||
doesn't mandate anything about processing e-mail addresses, other than the fact
|
||||
that the domain part must be converted to lowercase, as domain names are case
|
||||
insensitive.
|
||||
|
||||
On the other hand, most major e-mail providers nowadays process the localparts
|
||||
of e-mail addresses as case insensitive. Therefore, most users expect localparts
|
||||
to be treated case insensitively, and get confused when it's not. Some users,
|
||||
for example, get confused over the fact that registering a 3PID association for
|
||||
`john.doe@example.com` doesn't mean that the association is valid for
|
||||
`John.Doe@example.com`, and don't expect to have to remember the exact
|
||||
case they used to initially register the association (and sometimes get locked
|
||||
out of their account because of that). So far we've seen that confusion occur
|
||||
and lead to troubles of various degrees over several deployments of Synapse and
|
||||
Sydent.
|
||||
|
||||
## Proposal
|
||||
|
||||
This proposal suggests changing the specification of the e-mail 3PID type in
|
||||
[the Matrix spec appendices](https://matrix.org/docs/spec/appendices#pid-types)
|
||||
to mandate that, before any processing, e-mail addresses must go through a full
|
||||
case folding based on [the unicode mapping
|
||||
file](https://www.unicode.org/Public/8.0.0/ucd/CaseFolding.txt), on top of
|
||||
having their domain lowercased.
|
||||
|
||||
This means that `Strauß@Example.com` must be considered as being the same e-mail
|
||||
address as `strauss@example.com`.
|
||||
|
||||
## Other considered solutions
|
||||
|
||||
A first look at this issue concluded that there was no need to add such a
|
||||
mention to the spec, and that it can be considered an implementation detail.
|
||||
However, [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) changes
|
||||
this: because hashing functions are case sensitive, we need both clients and
|
||||
identity servers to follow the same policy regarding case sensitivity.
|
||||
|
||||
An initial version of this proposal proposed to mandate lowercasing e-mail
|
||||
addresses instead of case folding them, however it was pointed out that this
|
||||
solution might not be the best and most future-proof one.
|
||||
|
||||
Unicode normalisation was also looked at but judged unnecessary.
|
||||
|
||||
## Tradeoffs
|
||||
|
||||
Implementing this MSC in identity servers and homeservers might require the
|
||||
databases of existing instances to be updated in a large part to case fold the
|
||||
email addresses of existing associations, in order to avoid conflicts. However,
|
||||
most of this update can usually be done by a background job running at startup,
|
||||
so the UX improvement outweighs this trouble.
|
||||
|
||||
## Potential issues
|
||||
|
||||
### Conflicts with existing associations
|
||||
|
||||
Some users might already have two different accounts associated with the same
|
||||
e-mail address but with different cases. This appears to happen in a small
|
||||
number of cases, however, and can be dealt with by the identity server's or the
|
||||
homeserver's maintainer.
|
||||
|
||||
For example, with Sydent, the process of dealing with such cases could look
|
||||
like:
|
||||
|
||||
1. list all MXIDs associated with a variant of the email address, and the
|
||||
timestamp of that association
|
||||
2. delete all associations except for the most recent one [0]
|
||||
3. inform the user of the deletion by sending them an email notice to the email
|
||||
address
|
||||
|
||||
### Storing and querying
|
||||
|
||||
Most database engines don't support case folding, therefore querying all
|
||||
e-mail addresses matching a case folded e-mail address might not be trivial,
|
||||
e.g. an identity server querying all associations for `strauss@example.com` when
|
||||
processing a `/lookup` request would be expected to also get associations for
|
||||
`Strauß@Example.com`.
|
||||
|
||||
To address this issue, implementation maintainers are strongly encouraged to
|
||||
make e-mail addresses go through a full case folding before storing them.
|
||||
|
||||
### Implementing case folding
|
||||
|
||||
The need for case folding in services on the Internet doesn't seem to be very
|
||||
large currently (probably due to its young age), therefore there seem to be only
|
||||
a few third-party implementation librairies out there. However, both
|
||||
[Go](https://godoc.org/golang.org/x/text/cases#Fold), [Python
|
||||
2](https://docs.python.org/2/library/stringprep.html#stringprep.map_table_b3)
|
||||
and [Python 3](https://docs.python.org/3/library/stdtypes.html#str.casefold)
|
||||
support it natively, and [a third-party JavaScript
|
||||
implementation](https://github.com/ar-nelson/foldcase) exists which, although
|
||||
young, seems to be working.
|
||||
|
||||
## Footnotes
|
||||
|
||||
[0]: This is specific to Sydent because of a bug it has where v1 lookups are
|
||||
already processed case insensitively, which means it will return the most recent
|
||||
association for any case of the given email address, therefore keeping only this
|
||||
association won't change the result of v1 lookups.
|
Loading…
Reference in New Issue