MSC1767: Extensible event types & fallback in Matrix (v2) (#1767)

* first cut of MSC1767 for extensible events (replaces MSC1225) * tabs->spaces * fix markdown * fix markdown * GFM needs two spaces after ##? * delta from msc1225 * 2021 refresh * Refactor to be just text messaging + schema * Update MSC numbers * Touchups for FCP * *ahem* * Rewrite MSC to be understandable? * Update proposals/1767-extensible-events.md * Rewrite MSC, again * Clarify Andy's blog state * Relax m.markup's requirements * Misc clarifications * Revert "Relax m.markup's requirements" This reverts commit 5f15b9f552. * Clarify push rules handling * Fix example * Update extensible events MSCs list * Update Andy's blog post reference * Update per review feedback * Move emotes, notices, and encryption out to dedicated MSCs * Clarify text throughout * Cover all the bases * Clarify mixins are exactly what they're assumed to be * Clarify fallback and number of datums per event * Fix description of plain text baseline * Clarify SCT's role in feature scope & cut "mixed" events (not mixins) * Fix description of mixins * Update proposals/1767-extensible-events.md Co-authored-by: David Baker <dbkr@users.noreply.github.com> * Expand on how unknown events are handled * Apply suggestions from code review Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> * Trim line length post-suggestions * Mention that extensible events might appear in other room versions early * Remove duplicated unknown parse order being an implementation detail * Note difference between optional content blocks and mixins * Apply suggestions from code review Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> * Update proposals/1767-extensible-events.md Co-authored-by: Alexey Rusakov <Kitsune-Ral@users.sf.net> * Clarify that clients are still strongly encouraged to validate HTML * Fix mention of mixing legacy event types * Update proposals/1767-extensible-events.md Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> * Link to MSC3765 * Rename `m.markup` to `m.text` --------- Co-authored-by: Travis Ralston <travpc@gmail.com> Co-authored-by: Travis Ralston <travisr@matrix.org> Co-authored-by: David Baker <dbkr@users.noreply.github.com> Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> Co-authored-by: Alexey Rusakov <Kitsune-Ral@users.sf.net> Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
3 years ago · 01654eb2de
parent 13935ec33b
commit 01654eb2de
1 changed files with 410 additions and 0 deletions
--- a/proposals/1767-extensible-events.md
+++ b/proposals/1767-extensible-events.md
@ -0,0 +1,410 @@
 # MSC1767: Extensible events in Matrix
 While events are currently JSON blobs which accept additional metadata appended to them,
 there is no formal structure for how to represent this information or interpret it on the
 client side, particularly in the case of unknown event types.
 When specifying new events, the proposals often reinvent the same wheel instead of reusing
 existing blocks or types, such as in cases where captions, thumbnails, etc need to be
 considered for an event. This has further issues of clients not knowing how to render these
 newly-specified events, leading to mixed compatibility within the ecosystem.
 The above seriously hinders the uptake of new event types (and therefore features) within
 the Matrix ecosystem. In the current system, a new event type would be introduced and
 all implementations slowly gain support for it - if we instead had reusable types then
 clients could automatically support a "good enough" version of that new event type while
 "proper" support is written in over time. Such an example could be polls: not every
 client will want polls right away, but it would be quite limiting as a user experience
 if some users can't even see the question being posed.
 This proposal introduces a structure for how extensible events are represented, using
 the existing extensible nature of events today, laying the groundwork for more reusable
 blocks of content in future events.
 With text being the simplest form of representation for events today, this MSC also
 specifies a relatively basic text schema for room messages that can be reused in other
 events. Other building block types are specified by other MSCs:
 * [MSC3954 - Emotes](https://github.com/matrix-org/matrix-doc/pull/3954)
 * [MSC3955 - Notices / automated events](https://github.com/matrix-org/matrix-doc/pull/3955)
 * [MSC3956 - Encryption](https://github.com/matrix-org/matrix-doc/pull/3956)
 * [MSC3927 - Audio](https://github.com/matrix-org/matrix-doc/pull/3927)
 * [MSC3551 - Files](https://github.com/matrix-org/matrix-doc/pull/3551)
 * [MSC3552 - Images and Stickers](https://github.com/matrix-org/matrix-doc/pull/3552)
 * [MSC3553 - Videos](https://github.com/matrix-org/matrix-doc/pull/3553)
 * [MSC3554 - Translatable text](https://github.com/matrix-org/matrix-doc/pull/3554)
 Some examples of new features/events using extensible events are:
 * [MSC3488 - Location data](https://github.com/matrix-org/matrix-doc/pull/3488)
 * [MSC3381 - Polls](https://github.com/matrix-org/matrix-doc/pull/3381)
 * [MSC3245 - Voice messages](https://github.com/matrix-org/matrix-doc/pull/3245)
 * [MSC2192 - Inline widgets](https://github.com/matrix-org/matrix-doc/pull/2192)
 * [MSC3765 - Rich text topics](https://github.com/matrix-org/matrix-doc/pull/3765)
 **Note**: Readers might find [Andy's blog](https://www.artificialworlds.net/blog/2022/03/08/comparison-of-matrix-events-before-and-after-extensible-events/)
 useful for understanding the problem space. Unfortunately, for those who need to
 understand the changes to the protocol/specification, the best option is to read
 this proposal.
 ## Proposal
 In a new room version (why is described later in this proposal), events are declared
 to be represented by their extensible form, as described by this MSC. `m.room.message`
 is formally deprecated by this MSC, with removal from the specification happening as
 part of a room version adopting the feature. Clients are expected to use extensible
 events only in rooms versions which explicitly declare such support (in both unstable
 and stable settings), except where noted later in this proposal.
 An extensible event is made up of two critical parts: an event type and zero or more
 content blocks. The event type defines which content blocks a receiver can expect,
 and the content blocks carry the information needed to render the event (whether the
 client understands the event type or not).
 Content blocks are simply any top-level key in `content` on the event. They can have
 any value type (that is also legal in an event generally: string, integer, etc), and
 are namespaced using the
 [Matrix conventions for namespacing](https://spec.matrix.org/v1.4/appendices/#common-namespaced-identifier-grammar).
 Content blocks can be invented independent of event types and *should* be reusable
 in nature. For example, this proposal introduces an `m.text` content block which
 can be reused by other event types to represent textual fallback.
 When a client encounters an extensible event (any event sent in a supported room
 version) that it does *not* understand, the client begins searching for a best match
 based on event type schemas it *does* know. This may mean combining multiple different
 content blocks to match a suitable schema, such as in the case of
 [MSC3553](https://github.com/matrix-org/matrix-doc/pull/3553) video events.
 Which schemas to try, and in what order, is left as a deliberate implementation detail.
 A client might decide to try parsing the event as a video, then image, then file, then
 text message, for example.
 It is generally not expected that a single content block will describe an entire event,
 except in the exceedingly trivial cases (like text messages in this proposal). Multiple
 content blocks will usually fully describe the information in the event, and mixins
 (described later) can further change how an event is represented or processed.
 Note that a "client" in an extensible events sense will typically mean an application
 using the Client-Server API, however in reality a client will be anything which needs
 to parse and understand event contents (servers for some functions like push rules,
 application services, etc).
 Per the introduction, text is the baseline format that most/all Matrix clients support
 today, often through use of HTML and `m.room.message`. Instead of using `m.room.message`
 to represent this content, clients would instead use an `m.message` event with, at
 a minimum, a `m.text` content block:
 ```json5
 {
    // irrelevant fields not shown
    "type": "m.message",
    "content": {
        "m.text": [
            { "body": "<i>Hello world</i>", "mimetype": "text/html" },
            { "body": "Hello world" }
        ]
    }
 }
 ```
 `m.text` has the following definitions associated with it:
 * An ordered array of mimetypes and applicable string content to represent a single
  marked-up blob of text. Each element is known as a representation.
 * `body` in a representation is required, and must be a string.
 * `mimetype` is optional in a representation, and defaults to `text/plain`.
 * Zero representations are permitted, however senders should aim to always specify
  at least one.
 * Invalid representations are skipped by clients (missing `body`, not an object, etc).
 * The first representation a renderer understands should be used.
 * Senders are strongly encouraged to always include a plaintext representation.
 * The `mimetype` of a representation determines its `body` - no effort is made to
  limit what is allowed in the `body`, however clients are still strongly encouraged
  to validate/sanitize the content further, like in the
  [existing spec](https://spec.matrix.org/v1.4/client-server-api/#mroommessage-msgtypes)
  for HTML.
 * Custom text formats in a representation are specified by a suitably custom `mimetype`.
  For example, a representation might use a text format extending HTML or XML, or an
  all-new markup. This can be used to create bridge-compatible clients where the
  destination network's markup is first in the array, followed by more common HTML
  and text formats.
 Like with the event described above, all event types now describe which content blocks
 they expect to see on their events. These content blocks could be required, as is the
 case of `m.text` in `m.message`, or they could be optional depending on the situation.
 Of course, senders are welcome to send even more blocks which aren't specified in the
 schema for an event type, however clients which understand that event type might not
 consider them at all.
 In `m.message`'s case, `m.text` is the only required content block. The `m.text`
 block can be reused by other events to include a text-like format for the event, such
 as a text fallback for clients which do not understand how to render a custom event
 type.
 To reiterate, when a client encounters an unknown event type it first tries to see
 if there's a set of content blocks present that it can associate with a known event
 type. If it finds suitable content blocks, it parses the event as though the event
 were of the known type. If it doesn't find anything useful, the event is left as
 unrenderable, just as it likely would today.
 To avoid a situation where events end up being unrenderable, it is strongly
 recommended that all event types support at least an `m.text` content block in
 their schema, thus allowing all events to theoretically be rendered as message
 events (in a worst case scenario).
 For clarity, events are not able to specify *how* they are handled when the receiver
 doesn't know how to render the event type: the sender simply includes all possible or
 feasible representations for the data, hoping the receiver will pick the richest form
 for the user. As an example, a special medical imaging event type might also be
 represented as a video, static image, or text (URL to some healthcare platform): the
 sender includes all 3 fallbacks by specifying the needed content blocks, and the
 receiver may pick the video, image, or text depending on its own rules.
 Events must still only represent a single logical piece of information, thus encouraging
 sensible fallback options in the form of content blocks. The information being represented
 is described by the event type, as it always has been before this MSC. It is explicitly
 not permitted to represent two or more pieces of information in a single event, such
 as a livestream reference and poll: senders should look into
 [relationships](https://spec.matrix.org/v1.5/client-server-api/#forming-relationships-between-events)
 instead.
 ### Worked example: Custom temperature event
 In a hypothetical scenario, a temperature event might look as such:
 ```json5
 {
    // irrelevant fields not shown
    "type": "org.example.temperature",
    "content": {
        "m.text": [{"body": "It is 22 degrees at Home"}],
        "org.example.probe_value": {
            "label": "Home",
            "units": "org.example.celsius",
            "value": 22
        }
    }
 }
 ```
 In this scenario, clients which understand how to render an `org.example.temperature`
 event might use the information in `org.example.probe_value` exclusively, leaving the
 `m.text` block for clients which *don't* understand the temperature event type.
 Another event type might find inspiration and use the probe value block for their
 event as well. Such an example might be in a more industrial control application:
 ```json5
 {
    // irrelevant fields not shown
    "type": "org.example.tank.level",
    "content": {
        "m.text": [{"body": "[Danger] The water tank is 90% full."}],
        "org.example.probe_value": {
            "label": "Tank 3",
            "units": "org.example.litres",
            "value": 9037
        },
        "org.example.danger_level": "alert"
    }
 }
 ```
 This event also demonstrates a `org.example.danger_level` block, which uses a string
 value type instead of the previously demonstrated objects and values - this is a legal
 content block, as blocks can be of any type.
 Clients should be cautious and avoid reusing too many unspecified types as it can create
 opportunities for confusion and inconsistency. There should always be an effort to get
 useful event types into the Matrix spec for others to benefit from.
 ### Room version
 This MSC requires a room version to make the transition process clear and coordinated.
 Normally for a feature such as this, an effort would be made to attempt to support
 backwards compatibility for a duration of time, however for a feature that requires
 significant overhaul of clients, servers, and Matrix as a whole it feels more important
 to bias towards a clear switch between legacy and modern (extensible) events.
 **Note**: A previous draft of this proposal (codenamed "v1 extensible events") did attempt
 to describe a timeline-based approach, allowing for event types to mix concepts of content
 blocks and legacy fields, however that approach did not give sufficient reason for clients
 to fully adopt the extensible events changes.
 In room versions supporting extensible events, clients MUST only send extensible events.
 Deprecated event types (to be enumerated at the time of making the room version) MUST NOT
 be sent into extensible event-supporting room versions, and clients MUST treat deprecated
 event types as unrenderable by force. For example, if a client sees an `m.room.message` in
 an extensible event-supporting room version, it must not render it, even if it knows how
 to render that type.
 While full enforcement of this restriction is not feasible, servers are encouraged to block
 Client-Server API requests for sending known-banned event types into applicable rooms. This
 obviously does not help when the room is encrypted, or the client is sending custom events
 in a non-extensible form, hence the requirement that clients treat the events as invalid too.
 Using the usual MSC process, the Spec Core Team (SCT) will be responsible for determining
 the minimum scope of extensible events in a published (stable) room version.
 Meanwhile, clients are welcome to use the unstable implementations of extensible event-supporting
 features, provided they are in an appropriate room version. Some event type MSCs declare
 explicit support for what would normally be an unsupported room version - client authors
 should check the applicable MSC or specification for the feature to determine if they are
 allowed to do this. Such examples include MSC3381 Polls and MSC3245 Voice Messages.
 ### State events
 Unknown state event types generally should not be parsed by clients. This is to prevent situations
 where the sender masks a state change as some other, non-state, event. For example, even
 if a state event has an `m.text` content block, it should not be treated as a room message.
 Note that state events MUST still make use of content blocks in applicable room versions, and that
 any top-level key in `content` is defined as a content block under this proposal. As such, this
 MSC implicitly promotes all existing content fields of `m.*` state events to independent content
 blocks as needed. Other MSCs may override this decision on a per-event type basis (ie: redeclaring
 how room topics work to support content blocks, deprecating the existing `m.room.topic` event in
 the process, like in [MSC3765](https://github.com/matrix-org/matrix-spec-proposals/pull/3765)).
 Unlike most content blocks, these promoted-to-content-blocks are not realistically meant to be
 reused: it is simply a formality given this MSC's scope.
 ### Notifications
 Currently [push notifications](https://spec.matrix.org/v1.5/client-server-api/#push-notifications)
 describe how an event can cause a notification to the user, though it makes the assumption
 that there are `m.room.message` events flying around to denote "messages" which can trigger
 keyword/mention-style alerts. With extensible events, the same might not be possible as it
 relies on understanding how/when the client will render the event to cause notifications.
 For simplicity, when `content.body` is used in an `event_match` condition, it now looks for
 an `m.text` block's `text/plain` representation (implied or explicit) in room versions
 supporting extensible events. This is not an easy rule to represent in the existing push
 rules schema, and this MSC has no interest in designing a better schema. Note that other
 conditions applied to push notifications, such as an event type check, are not affected by
 this: clients/servers will have to alter applicable push rules to handle the new event types
 (see also: [MSC3933](https://github.com/matrix-org/matrix-spec-proposals/pull/3933) and friends).
 ### Power levels
 This MSC proposes no changes to how power levels interact with events: they are still
 capable of restricting which users can send an event type. Though events might be rendered
 as a different logical type (ie: unknown event being rendered as a message), this does not
 materially impact the room's ability to function. Thus, considerations for how to handle
 power levels more intelligently are details left for a future MSC.
 As of writing, most rooms fit into two categories: any event type is possible to send, or
 specific cherry-picked event types are allowed (announcement rooms: reactions & redactions).
 Extensible events don't materially change the situation implied by this power levels structure.
 ### Mixins specifically allowed
 A **mixin** is a specific type of content block which can be added to any type of event to
 change how that event is processed. Content blocks which are
 mixins will be called out as such in the spec. Mixins are meant to be purely additive,
 thus all event types MUST support being rendered/processed *without* the use of mixins.
 See also the [Wikipedia entry on mixins](https://en.wikipedia.org/wiki/Mixin).
 Note that mixins differ from optional content blocks in an event type's schema: a mixin
 is able to be applied to *any* event type sensibly while optional content blocks are
 generally only valuable to the applicable event types.
 Though this MSC does not describe any such mixins itself,
 [MSC3955](https://github.com/matrix-org/matrix-spec-proposals/pull/3955) does by allowing any
 event to be flagged as "automated" - a strictly additive annotation on events.
 Another possible mixin would be `m.relates_to` (not described by this MSC). Currently,
 some features like the [key verification framework](https://spec.matrix.org/v1.5/client-server-api/#key-verification-framework)
 rely on relationships as part of making the feature work. The expectation is that
 these features would be adapted to meet the "purely additive" condition (assuming
 `m.relates_to` does actually end up being a mixin).
 ### Uses of HTML & text throughout the spec
 For an abundance of clarity, all functionality not explicitly called out in this MSC which
 relies on the `formatted_body` of an `m.room.message` is expected to transition to using
 an appropriate `m.text` representation instead. For example, the HTML representation of
 a [mention](https://spec.matrix.org/v1.5/client-server-api/#user-and-room-mentions) will
 now appear under `m.text`'s `text/html` representation (adding one if required).
 A similar condition is applied to `body` in `m.room.message`: all existing functionality
 will instead use the `text/plain` representation within `m.text`, if not explicitly
 called out by this MSC.
 ## Potential issues
 It's a bit ugly to not know whether a given key in `content` will take a string, object,
 boolean, integer, or array.
 It's a bit ugly to not know at a glance if a content block is a mixin or not.
 It's a bit ugly that you have to look over the keys of contents to see what blocks
 are present, but better than duplicating this into an explicit `blocks` list within the
 event content (on balance).
 We're skipping over defining rules for which fallback combinations to display
 (i.e. "display hints") for now; these can be added in a future MSC if needed.
 [MSC1225](https://github.com/matrix-org/matrix-doc/issues/1225) contains a proposal for this.
 Placing content blocks at the top level of `content` is a bit unfortunate, though mixes
 nicely thanks to namespacing. Potentially conflicting cases in the wild would be
 namespaced fields, which would get translated as unrenderable events if the value type
 doesn't meet the client's known schema.
 This MSC does not rewrite or redefine all possible events in the specification: this is
 deliberately left as an exercise for several future MSCs.
 ## Security considerations
 Like today, it's possible to have the different representations of an event not match,
 thus introducing a potential for malicious payloads (text-only clients seeing something
 different to HTML-friendly ones). Clients could try to do similarity comparisons, though
 this is complicated with features like HTML and arbitrary custom markup (markdown, etc)
 showing up in the plaintext or in tertiary formats on the events. Historically, room
 moderators have been pretty good about removing these malicious senders from their rooms
 when other users point out (quite quickly) that the event is appearing funky to them.
 ## Note about spec process
 Extensible events as a spec feature requires dozens of different MSCs, with this MSC being
 the structure definition and text baseline. It is *not* expected that this MSC will be
 written into spec once it has passed FCP. Instead, it is expected that all of the "core"
 extensible events MSCs will pass FCP and extensible events be assigned a stable room version
 before any spec authoring begins. Thus, this particular MSC should be anticipated to sit
 in accepted-but-not-merged (stable, not formal spec yet) for a while, and that's okay.
 The Spec Core Team (SCT) has decision making power over what is considered core for extensible
 events, though the recommendation is to ensure replacements for all non-state `m.room.*` types
 have accepted (successful FCP) MSCs to replace them.
 ## Unstable prefix
 While this MSC is not considered stable by the specification, implementations *must* use
 `org.matrix.msc1767` as a prefix to denote the unstable functionality. For example, sending
 an `m.message` event would mean sending an `org.matrix.msc1767.message` event instead.
 For purposes of testing, implementations can use a dynamically-assigned unstable room version
 `org.matrix.msc1767.<version>` to use extensible events within. For example, `org.matrix.msc1767.10`
 for room version 10 or `org.matrix.msc1767.org.example.cool_ver` for a hypothetical
 `org.example.cool_ver` room version. Any events sent in these room versions *can* use stable
 identifiers given the entire room version itself is unstable, however senders *must* take care
 to ensure stable identifiers do not leak out to other room versions - it may be simpler to not
 send stable identifiers at all.
 ## Changes from MSC1225
 * converted from googledoc to MD, and to be a single PR rather than split PR/Issue.
 * simplifies it by removing displayhints (for now - deferred to a future MSC).
 * replaces the clunky m.text.1 idea with lists for types which support fallbacks.
 * removes the concept of optional compact form for m.text by instead having m.text always in expanded form.
 * tries to accomodate most of the feedback on GH and Google Docs from MSC1225.
 ## Historical changes
 * Anything that wasn't simple text rendering was broken out to dedicated MSCs in an effort to get the
  structure approved and adopted while the more complex types get implemented independently.
 * Renamed subtypes/reusable types to just "content blocks".
 * Allow content blocks to be nested.
 * Fix push rules in the most basic sense, deferring to a future MSC on better support.
 * Explicitly make no changes to power levels, deferring to a future MSC on better support.
 * Drop timeline for transition in favour of an explicit room version.
 * Move most push rule changes and such into their own/future MSCs.
 * Move emotes, notices, and encryption out to their own dedicated MSCs.