|
|
|
# Proposal for mandating case folding when processing e-mail addresses
|
|
|
|
|
|
|
|
[RFC822](https://tools.ietf.org/html/rfc822#section-3.4.7) mandates that
|
|
|
|
localparts in e-mail addresses must be processed with the original case
|
|
|
|
preserved. [The Matrix spec](https://matrix.org/docs/spec/appendices#pid-types)
|
|
|
|
doesn't mandate anything about processing e-mail addresses, other than the fact
|
|
|
|
that the domain part must be converted to lowercase, as domain names are case
|
|
|
|
insensitive.
|
|
|
|
|
|
|
|
On the other hand, most major e-mail providers nowadays process the localparts
|
|
|
|
of e-mail addresses as case insensitive. Therefore, most users expect localparts
|
|
|
|
to be treated case insensitively, and get confused when it's not. Some users,
|
|
|
|
for example, get confused over the fact that registering a 3PID association for
|
|
|
|
`john.doe@example.com` doesn't mean that the association is valid for
|
|
|
|
`John.Doe@example.com`, and don't expect to have to remember the exact
|
|
|
|
case they used to initially register the association (and sometimes get locked
|
|
|
|
out of their account because of that). So far we've seen that confusion occur
|
|
|
|
and lead to troubles of various degrees over several deployments of Synapse and
|
|
|
|
Sydent.
|
|
|
|
|
|
|
|
## Proposal
|
|
|
|
|
|
|
|
This proposal suggests changing the specification of the e-mail 3PID type in
|
|
|
|
[the Matrix spec appendices](https://matrix.org/docs/spec/appendices#pid-types)
|
|
|
|
to mandate that, before any processing, e-mail addresses must go through a full
|
|
|
|
case folding as described under "Caseless Matching" in
|
|
|
|
[chapter 5 of the unicode standard](https://www.unicode.org/versions/Unicode13.0.0/ch05.pdf#G21790), on top of
|
|
|
|
having their domain lowercased.
|
|
|
|
|
|
|
|
This means that `Strauß@Example.com` must be considered as being the same e-mail
|
|
|
|
address as `strauss@example.com`.
|
|
|
|
|
|
|
|
## Other considered solutions
|
|
|
|
|
|
|
|
A first look at this issue concluded that there was no need to add such a
|
|
|
|
mention to the spec, and that it can be considered an implementation detail.
|
|
|
|
However, [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) changes
|
|
|
|
this: because hashing functions are case sensitive, we need both clients and
|
|
|
|
identity servers to follow the same policy regarding case sensitivity.
|
|
|
|
|
|
|
|
An initial version of this proposal proposed to mandate lowercasing e-mail
|
|
|
|
addresses instead of case folding them, however it was pointed out that this
|
|
|
|
solution might not be the best and most future-proof one.
|
|
|
|
|
|
|
|
Unicode normalisation was also looked at but judged unnecessary.
|
|
|
|
|
|
|
|
## Tradeoffs
|
|
|
|
|
|
|
|
Implementing this MSC in identity servers and homeservers might require the
|
|
|
|
databases of existing instances to be updated in a large part to case fold the
|
|
|
|
email addresses of existing associations, in order to avoid conflicts. However,
|
|
|
|
most of this update can usually be done by a background job running at startup,
|
|
|
|
so the UX improvement outweighs this trouble.
|
|
|
|
|
|
|
|
## Potential issues
|
|
|
|
|
|
|
|
### Conflicts with existing associations
|
|
|
|
|
|
|
|
Some users might already have two different accounts associated with the same
|
|
|
|
e-mail address but with different cases. This appears to happen in a small
|
|
|
|
number of cases, however, and can be dealt with by the identity server's or the
|
|
|
|
homeserver's maintainer.
|
|
|
|
|
|
|
|
For example, with Sydent, the process of dealing with such cases could look
|
|
|
|
like:
|
|
|
|
|
|
|
|
1. list all MXIDs associated with a variant of the email address, and the
|
|
|
|
timestamp of that association
|
|
|
|
2. delete all associations except for the most recent one [0]
|
|
|
|
3. inform the user of the deletion by sending them an email notice to the email
|
|
|
|
address
|
|
|
|
|
|
|
|
### Storing and querying
|
|
|
|
|
|
|
|
Most database engines don't support case folding, therefore querying all
|
|
|
|
e-mail addresses matching a case folded e-mail address might not be trivial,
|
|
|
|
e.g. an identity server querying all associations for `strauss@example.com` when
|
|
|
|
processing a `/lookup` request would be expected to also get associations for
|
|
|
|
`Strauß@Example.com`.
|
|
|
|
|
|
|
|
To address this issue, implementation maintainers are strongly encouraged to
|
|
|
|
make e-mail addresses go through a full case folding before storing them.
|
|
|
|
|
|
|
|
### Implementing case folding
|
|
|
|
|
|
|
|
The need for case folding in services on the Internet doesn't seem to be very
|
|
|
|
large currently (probably due to its young age), therefore there seem to be only
|
|
|
|
a few third-party implementation libraries out there. However, both
|
|
|
|
[Go](https://godoc.org/golang.org/x/text/cases#Fold), [Python
|
|
|
|
2](https://docs.python.org/2/library/stringprep.html#stringprep.map_table_b3)
|
|
|
|
and [Python 3](https://docs.python.org/3/library/stdtypes.html#str.casefold)
|
|
|
|
support it natively, and [a third-party JavaScript
|
|
|
|
implementation](https://github.com/ar-nelson/foldcase) exists which, although
|
|
|
|
young, seems to be working.
|
|
|
|
|
|
|
|
## Footnotes
|
|
|
|
|
|
|
|
[0]: This is specific to Sydent because of a bug it has where v1 lookups are
|
|
|
|
already processed case insensitively, which means it will return the most recent
|
|
|
|
association for any case of the given email address, therefore keeping only this
|
|
|
|
association won't change the result of v1 lookups.
|