You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
matrix-spec-proposals/proposals/4033-event-thread-and-order.md

16 KiB

MSC4033: Explicit ordering of events for receipts

The spec states that receipts are "read-up-to" without explaining what order the events are in, so it is difficult to decide whether an event is before or after a receipt.

We propose adding an explicit order number to all events, so that it is clear which events are read.

This proposal covers receipts, and not fully-read markers. Fully-read markers have the same issue in terms of ordering, and should probably be fixed in a similar way, but they are not addressed here.

Motivation

To decide whether a room is unread, a Matrix client must decide whether it contains any unread messages.

Similarly, to decide whether a room has notifications, we must decide whether any of its potentially-notifying messages is unread.

Both of these tasks require us to decide whether a message is read or unread.

To make this decision we have receipts. We use the following rule:

An event is read if the room contains an unthreaded receipt pointing at an event which is after the event, or a threaded receipt pointing at an event that is in the same thread as the event, and is after or the same as the event.

Otherwise, it is unread.

(In both cases we only consider receipts sent by the current user, obviously. We consider either private or public read receipts.)

To perform this calculation we need a clear definition of after.

Current definition of after

The current spec (see 11.6 Receipts) is not clear about what it calls "read up to" means.

Clients like Element Web make the assumption that after means "after in Sync Order", where "Sync Order" means "the order in which I (the client) received the events from the server via sync", so if a client received an event and another event for which it has a receipt via sync, then the event that was later in the sync or received in a later sync, is after the other one.

See room-dag-concepts for some Synapse-specific information on Stream Order. In Synapse, Sync Order is expected to be identical to its concept of Stream Order.

See also Spec Issue #1167, which calls out this ambiguity about the meaning of "read up to".

Problems with the current definition

The current definition of after is ambiguous, and difficult for clients to calculate. It depends on only receiving events via sync, which is impossible since we sometimes want messages that did not arrive via sync, so we use different APIs such as messages or relations.

The current definition also makes it needlessly complex for clients to determine whether an event is read because the receipt itself does not hold enough information: the referenced event must be fetched and correctly ordered.

Note: these problems actually apply to all receipts, not just those of the current user. The symptoms are much more visible and impactful when the current user's receipts are misinterpreted than for other users, but this proposal covers both cases.

Proposal

We propose to add an explicit order number to events and receipts, so we can easily compare whether an event is before or after a receipt.

This order should be a number that is attached to an event by the server before it sends it to any client, and it should never change. It should, loosely-speaking, increase for "newer" messages within the same room.

The order of an event may be negative, and if so it is understood that this event is always read. The order included with a receipt should never be negative.

The ordering must be consistent between a user's homeserver and all of that user's connected clients. There are no guarantees it is consistent across different users or rooms. It will be inconsistent across federation as there is no mechanism to sync order between homeservers. For this reason, we propose that order be included in an event's unsigned property.

This proposal attaches no particular meaning to the rate at which the ordering increments. (Although we can imagine that some future proposal might want to expand this idea to include some meaning.)

Examples

Example event (changes are highlighted in bold):

{
  "type": "m.room.message",
  "content": {
    "body": "This is an example text message",
    "format": "org.matrix.custom.html",
    "formatted_body": "<b>This is an example text message</b>",
    "msgtype": "m.text"
  },
  "event_id": "$143273582443PhrSn:example.org",
  "origin_server_ts": 1432735824653,
  "room_id": "!jEsUZKDJdhlrceRyVU:example.org",
  "sender": "@example:example.org",
  "unsigned": {
    "age": 1234,
    "order": 56764334543
  }
}

Example encrypted event (changes are highlighted in bold):

{
  "type": "m.room.encrypted",
  "content": {
    "algorithm": "m.megolm.v1.aes-sha2",
    "sender_key": "",
    "device_id": "",
    "session_id": "",
    "ciphertext": ""
  }
  "event_id": "$143273582443PhrSn:example.org",
  "origin_server_ts": 1432735824653,
  "room_id": "!jEsUZKDJdhlrceRyVU:example.org",
  "sender": "@example:example.org",
  "unsigned": {
    "age": 1234,
    "order": 56764334543
  }
}

Example receipt (changes are highlighted in bold):

{
  "content": {
    "$1435641916114394fHBLK:matrix.org": {
      "m.read": {
        "@erikj:jki.re": {
          "ts": 1436451550453,
          "order": 56764334544,
        }
      },
    }
  },
  "type": "m.receipt"
}

We propose:

  • all events should contain an order property inside unsigned.
  • all receipts should contain an order property alongside ts inside the information about an event, which is a cache of the order property within the referred-to event.

The order property in receipts should be inserted by servers when they are creating the aggregated receipt event.

If the server is not able to provide the order of a receipt (e.g. because it does not have the relevant event) it should not send the receipt. If a server later receives an event, allowing it to provide an order for this receipt, it should send the receipt at that time. Rationale: without the order, a receipt is not useful to the client since it is not able to use it to determine which events are read. If a receipt points at an unknown event, the safest assumption is that other events in the room are unread i.e. there is no receipt.

If a receipt is received for an event with negative order, the server should set the order in the receipt to zero. All events with negative order are understood to be read.

Note that the order property for a particular event will probably be the same for every user, so will be repeated multiple times in an aggregated receipt event. This structure was chosen to reduce the chance of breaking existing clients by introducing order at a higher level.

Proposed definition of after

We propose that the definition of after should be:

  • Event A is after event B if its order is larger.

We propose updating the spec around receipts (11.6 Receipts) to be explicit about what "read up to" means, using the above definition.

Definition of read and unread events

We propose that the definition of whether an event is read should include the original definition plus the above definition of after, and also include this clarification:

(Because the receipt itself contains the order of the pointed-to event, there is no need to examine the pointed-to event: it is sufficient to compare the order of the event in question with the order in the receipt.)

Further, it should be stated that events with negative order are always read, even if no receipt exists.

Order does not have to be unique

If this proposal required the order property to be unique within a room, it might inadvertently put constraints on the implementation of servers since some linearised process would need to be involved.

So, we do not require that order should be unique within a room. Instead, if two events have the same order, they are both marked as read by a receipt with that order.

Events with identical order introduce some imprecision into the process of marking events as read, so they should be minimised where possible, but some overlap is tolerable where the server implementation requires it.

So, a server might choose to use the epoch millisecond at which it received a message as its order. However, if a server receives a large batch of messages in the same millisecond, this might cause undesirable behaviour, so a refinement might be the millisecond as the integer part and a fractional part that increases as the batch is processed, preserving the order in which the server receives the messages in the batch.

If a server were processing multiple batches in parallel, it could implement this in each process separately, and accept that some events would receive identical orders, but this would be rare in practice and have little effect on end users' experience of unread markers.

Redacted events

Existing servers already include an unsigned section with redacted events, despite unsigned not being mentioned in the redaction rules.

Therefore we propose that redacted events should include order in exactly the same way as all room events.

Discussion

What order to display events in the UI?

It is desirable that the order property should match the order of events displayed in the client as closely as possible, so that receipts behave consistently with the displayed timeline. However, clients may have different ideas about where to display late-arriving messages, so it is impossible to define an order that works for all clients. Instead we agree that a consistent answer is the best we can do, and rely on clients to provide the best UX they can for late-arriving messages.

Stream order or Topological Order?

The two orders that we might choose to populate the order property are "stream order" where late-arriving messages tend to receive higher numbers, or "Topological Order" where late-arriving message tend to receive lower numbers.

We believe that it is better to consider late-arriving messages as unread, meaning the client has the information that these newly arrived messages have not been read and can choose how to display it (or not). This is what leads us to suggest Stream Order as the correct choice.

However, if servers choose Topological Order, this proposal still works - we just have what the authors consider undesirable behaviour regarding late-arriving events (they are seen as read even though they are not).

Inconsistency across federation

Because order may be inconsistent across federation1, one user may occasionally see a different unread status for another user from what that user themselves see. We regard this as impossible to avoid, and expect that in most cases it will be unnoticeable, since home servers with good connectivity will normally see events in similar orders. When servers have long network splits, there will be a noticeable difference at first, but once messages start flowing normally and users start reading them, the differences will disappear as new events will have higher Stream order than the older ones on both servers.

The focus of this proposal is that a single user sees consistent behaviour around their own read receipts, and we consider that much more important that the edge case of inconsistent behaviour across federation after a network split.

Implementation Notes

Some home servers such as Synapse already have a concept of Stream Order. We expect that the order defined here could be implemented using Stream Order.

Potential issues

This explicitly allows receipts to be inconsistent across federation. In practice this is already the case in the wild, and is impossible to solve using Stream Order. The problems with using Topological Order (and Sync Order) have already been outlined.

Alternatives

Solves the same problem MSC3981 Relations Recursion tried to solve

This proposal would not replace MSC3981: /relations recursion but would make it less important, because we would no longer depend on the server providing messages in Sync Order, so we could happily fetch messages recursively and still be able to slot them into the right thread and ordering.

Note that the expectation (from some client devs e.g. me @andybalaam) was that MSC3981 would solve many problems for clients because the events in a thread would be returned in Sync Order, but this is not true: the proposal will return events in Topological Order, which is useless for determining which events are read.

The server could report which rooms are unread

We could use the definitions within this proposal but avoid calculating what was unread on the client. Instead we could ask the server to figure out which rooms are unread.

The client will still need to know which events are unread in order to process notifications that are encrypted when they pass through the server, so this proposal would probably be unaltered even if we added the capability for servers to surface which rooms are unread.

Location of order property in receipts

Initially, we included order as a sibling of m.read inside the content of a receipt:

{
  "content": {
    "$1435641916114394fHBLK:matrix.org": {
      "order": 56764334544,
      "m.read": { "@rikj:jki.re": { "ts": 1436451550453, "thread_id": "$x" } },
      "m.read.private": { "@self:example.org": { "ts": 1661384801651 } }
    }
  },
  "type": "m.receipt"
}

We moved it inside the content, as a sibling to ts, because multiple existing clients (mautrix-go, mautrix-python and matrix-rust-sdk) would have failed to parse the above JSON if they encountered it without first being updated.

Drop receipts with missing order information

In the case where a server has a receipt to send to the client, but does not have the event to which it refers, and therefore cannot find its order, we proposed above that the server should hold the receipt until it has the relevant event, and send it then.

Alternatively, we could simply never send the receipt under these circumstances. We believe that this is reasonable because it is not expected to happen for the user's own events, which are the most critical to provide accurate read receipts, and implementing the "hold and send later" strategy may cause extra work for the server for little practical gain.

Security considerations

None highlighted so far.

Unstable prefix

TODO

Dependencies

None at this time.

Acknowledgements

Formed from a discussion with @ara4n, with early review from @clokep. Built on ideas from @t3chguy, @justjanne, @germain-gg and @weeman1337.

Changelog

  • 2023-07-04 Initial draft by @andybalaam after conversation with @ara4n.
  • 2023-07-05 Remove thread roots from their thread after conversation with @clokep.
  • 2023-07-05 Make redactions never unread after conversation with @t3chguy
  • 2023-07-05 Give a definition of Stream Order
  • 2023-07-05 Be explicit about Stream Order not going over federation
  • 2023-07-05 Mention disagreeing about what another user has read
  • 2023-07-05 Move thread_id into content after talking to @deepbluev7
  • 2023-07-06 Reduced to just order. Thread IDs will be a separate MSC
  • 2023-07-06 Moved order deeper within receipts to reduce existing client impact
  • 2023-07-13 Include order with redacted events after comments from @clokep

  1. In fact, order could also be inconsistent across different users on the same home server, although we expect in practice this will not happen. ↩︎