On why you should never trust an event.
parent
561043663f
commit
46440b0e75
@ -0,0 +1,185 @@
|
||||
# MSC2801: Make it explicit that event bodies are untrusted data
|
||||
|
||||
As the Matrix Specification stands today, it is easy for client developers to
|
||||
assume that all events will be structured as documented. For example, the
|
||||
`displayname` key of an [`m.room.member`
|
||||
event](https://matrix.org/docs/spec/client_server/r0.6.1#m-room-member) is
|
||||
specified to be "`string` or `null`"; and the `info` key of an [`m.image`
|
||||
message event](https://matrix.org/docs/spec/client_server/r0.6.1#m-image) must
|
||||
be a Javascript object.
|
||||
|
||||
In reality, these are not safe assumptions. This MSC proposes that the
|
||||
specification be updated to make it clear that developers should treat all
|
||||
event data as untrusted.
|
||||
|
||||
## Reasons why events may not match the specification
|
||||
|
||||
Firstly, let's examine the reasons why such malformed events can exist. Some of
|
||||
these reasons may appear to have trivial solutions; these are discussed below.
|
||||
|
||||
1. Most obviously, Synapse as it currently stands does very little validation
|
||||
of event bodies sent over the Client-Server API. There is nothing stopping
|
||||
any user sending a malformed `m.room.message` event with a simple `curl`
|
||||
command or Element-Web's `/devtools`.
|
||||
|
||||
Any event sent in this way will be returned to the clients of all the other
|
||||
users in the room.
|
||||
|
||||
2. In order to allow the Matrix protocol to be extensible, server
|
||||
implementations must tolerate unknown event types, and allow them to be
|
||||
passed between clients. It is obvious that a pair of custom clients
|
||||
implementing a `com.example.special.event` event type cannot rely on a
|
||||
standard server implementation to do any validation for them.
|
||||
|
||||
However, this problem extends to event types and keys which have been added
|
||||
to the specification. For example, the [`m.room.encrypted` event
|
||||
type](https://matrix.org/docs/spec/client_server/r0.6.1#m-room-encrypted)
|
||||
was added in Client-Server API r0.4.0. It therefore follows that a server
|
||||
implementing CS-API r0.3.0 would have no way to validate an
|
||||
`m.room.encrypted` event, so if a client is connected to such a server, it
|
||||
could receive malformed events.
|
||||
|
||||
3. To extend from point 2, the problem is not even resolved by upgrading the
|
||||
server. There may now be rooms which contain historical events which would
|
||||
no longer be accepted, but these will still be served by the server.
|
||||
|
||||
This problem also applies to non-room data such as account data. For
|
||||
example, Client-Server API r0.6.0 added the [`m.identity_server` account
|
||||
data event
|
||||
type](https://matrix.org/docs/spec/client_server/r0.6.1#m-identity-server).
|
||||
It is possible, if unlikely, that a client could have uploaded an
|
||||
`m.identity_server` event before the administrator upgraded the server.
|
||||
|
||||
3. Event redaction removes certain keys from an event. This is a bit of a
|
||||
trivial case, though it is worth noting that the rules for event redaction
|
||||
vary between room versions, so it's possible to see a variety of "partial"
|
||||
events.
|
||||
|
||||
4. All the cases above can occur without federation. Federation adds
|
||||
additional complexities due to the structure of Matrix rooms. In
|
||||
particular, a server implementation cannot simply ignore any malformed
|
||||
events since such events may either be critical to the structure of the
|
||||
room (for example, they may be `m.room.membership` events), or at the very
|
||||
least ignoring them would leave "holes" in the event graph which would
|
||||
prevent correct back-pagination.
|
||||
|
||||
## Ideas for mitigating the problem
|
||||
|
||||
The problems above appear to have some easy solutions. Let's describe some of
|
||||
those ideas, before considering the fundamental problem that none of them can
|
||||
solve.
|
||||
|
||||
### Validate all events when they are submitted
|
||||
|
||||
In this idea, we would require all server implementations to strictly validate
|
||||
any data which is sent over the Client-Server API, to ensure that it complied
|
||||
with the specified formats.
|
||||
|
||||
This evidently solves problem 1 above, in that it would prevent local users from
|
||||
creating malformed events of any event types that the server supports; however,
|
||||
it would do nothing to address any of the other problems.
|
||||
|
||||
### Validate events at the point they are served to a client
|
||||
|
||||
We could require that server implementations validate any data that they are
|
||||
about to serve to a client; we might recommend that malformed data be stripped
|
||||
out of the response, or redacted, or similar.
|
||||
|
||||
It is worth mentioning that this would be tricky to implement efficiently on
|
||||
the server side, but it at least helps address most of the problems above, such
|
||||
as historical data, or malformed events received over federation.
|
||||
|
||||
### Have servers re-validate data on upgrade
|
||||
|
||||
Similar to the previous idea, but rather than validating data each time it is
|
||||
served to a client, any stored data could be re-validated to check that it
|
||||
complies with new validation requirements.
|
||||
|
||||
This could be more efficient in the case that changes to validation rules are
|
||||
rare, but it could still be a huge amount of data processing on a large server.
|
||||
|
||||
### Create new room versions which require events to be well-formed
|
||||
|
||||
As outlined above, one of the big problems in this area is how we deal with
|
||||
events sent over federation; in particular, if subsets of the servers in a room
|
||||
have different ideas as to which events are "valid", then their concepts of the
|
||||
room state can begin to drift, and the room can eventually become
|
||||
"split-brained". This makes it hard to simply say, for example,
|
||||
"`m.room.member` events with a non-string `displayname` are invald and should
|
||||
not form part of the room state": we have a risk that some servers will accept
|
||||
the event, and some will not.
|
||||
|
||||
One approach to solving this is via [room
|
||||
versions](https://matrix.org/docs/spec/index#room-versions). By specifying that
|
||||
a change of rules only applies for a future room version, we can eliminate this
|
||||
potential disagreement.
|
||||
|
||||
The process of changing a room from one version to another is intrusive, not
|
||||
least because it requires that all servers in a room support the new room
|
||||
version (or risk being locked out). For that reason, it is extremely
|
||||
undesirable that any new feature require a new room version: whenever possible,
|
||||
it should be possible to use new features in existing rooms. It therefore
|
||||
follows that we cannot rely on room versions to provide validation of event
|
||||
data.
|
||||
|
||||
### Create a single new room version which applies all event validation rules
|
||||
|
||||
This idea is included for completeness, though it is unclear how it would work
|
||||
in practice.
|
||||
|
||||
It has been suggested that we create a new room version which explicitly states
|
||||
that events which fail the current event schema, whatever that is at that
|
||||
moment in time, should be rejected.
|
||||
|
||||
Let's imagine that in future, the `m.room.member` event schema is extended to
|
||||
include an optional `location` key, which, if given, must be a string. The
|
||||
implication of this idea is that servers should reject any `m.room.member`
|
||||
event whose `location` is not a string. We now have a problem: any servers in
|
||||
the room which are updated to the latest spec will reject such malformed
|
||||
events, but any other servers yet to be upgraded will allow it. So is that user
|
||||
in the room or not?
|
||||
|
||||
Even if all the servers in the room can be upgraded at once, what about any
|
||||
`m.room.member` events which were sent before the rule change?
|
||||
|
||||
## The fundamental problem
|
||||
|
||||
The ideas above all mitigate the problems discussed earlier to a greater or
|
||||
lesser extent, and may indeed be worth doing on their own merits. However, none
|
||||
of them can address the problem of outdated server implementations.
|
||||
|
||||
For example, consider the case of a new key being added to an event body, say
|
||||
`m.relates_to`. Now, we may have decided as above that all new specced keys
|
||||
must be validated by the server, so `vN+1` of Synapse dutifully implements such
|
||||
validation and refuses to accept events with a malformed `m.relates_to`.
|
||||
|
||||
The problem comes for users whose server is still Synapse `vN`. It knows
|
||||
nothing of `m.relates_to`, so accepts and passes it through even if
|
||||
malformed. The only potential solution is for clients seeking to implement
|
||||
`m.relates_to` to refuse to talk to servers which do not declare support for
|
||||
it.
|
||||
|
||||
However, this is an entirely client-side feature: it is illogical to require
|
||||
that servers must be upgraded before it can be used. Consider that the hosted
|
||||
element-web at `https://app.element.io` is upgraded to support the new feature;
|
||||
in this scenario, that would lock out any user whose homeserver had not yet
|
||||
been upgraded. This is not an acceptable user experience.
|
||||
|
||||
In short, we are left with the reality that clients must still handle the
|
||||
unvalidated data.
|
||||
|
||||
## Conclusions
|
||||
|
||||
Short of closely coupling server and client versions - which violates the
|
||||
fundamental ethos of the Matrix project - there is nothing that can completely
|
||||
prevent clients from having to handle untrusted data.
|
||||
|
||||
With that in mind, the advantages of the ideas above are diminished. If clients
|
||||
must handle untrusted data in some circumstances, why not in all? "You can
|
||||
trust the content of this data structure, provided you have checked that the
|
||||
server knows how to validate it, in which case you need to treat it as
|
||||
untrusted" is not a useful message for a client developer.
|
||||
|
||||
It may be possible to assert that specific, known cases can be treated as
|
||||
trusted data, but these should be called out as specific cases. The default
|
||||
message should be that clients must treat all event data as untrusted.
|
Loading…
Reference in New Issue