Merge 39903141e7 into d6edcbd946

2 weeks ago · f1a5714c1d
parent d6edcbd946 39903141e7
commit f1a5714c1d
1 changed files with 438 additions and 0 deletions
--- a/proposals/3871-gappy-timelines.md
+++ b/proposals/3871-gappy-timelines.md
@ -0,0 +1,438 @@
+# MSC3871: Gappy timeline
+
+`/messages` returns a linearized version of the event DAG. From any given
+homeservers perspective of the room, the DAG can have gaps where they're missing
+events. This could be because the homeserver hasn't fetched them yet or because
+it failed to fetch the events because those homeservers are unreachable and no
+one else knows about the event.
+
+Currently, there is an unwritten rule between the server and client that the
+server will always return all contiguous events in that part of the timeline.
+But the server has to break this rule sometimes when it doesn't have the event
+and is unable to get the event from anyone else. This MSC aims to change the
+dynamic so the server can give the client feedback and an indication of where
+the gaps are.
+
+This way, clients know where they are missing events and can even retry fetching
+by perhaps adding some UI to the timeline like "We failed to get some messages
+in this gap, try again."
+
+This can also make servers faster to respond to `/messages`. For example,
+currently, Synapse always tries to backfill and fill in the gap (even when it
+has enough messages locally to respond). In big rooms like `#matrix:matrix.org`
+(Matrix HQ), almost every place you ask for has gaps in it (thousands of
+backwards extremities) and lots of those events are unreachable so we try the
+same thing over and over hoping the response will be different this time but
+instead, we just make the `/messages` response time slow. With this MSC, we can
+instead be more intelligent about backfilling in the background and just tell
+the client about the gap that they can retry fetching a little later.
+
+
+## Proposal
+
+Add a `gaps` field to the response of [`GET
+/_matrix/client/v3/rooms/{roomId}/messages`](https://spec.matrix.org/v1.1/client-server-api/#get_matrixclientv3roomsroomidmessages).
+This field is an array of `GapEntry` indicating where missing events in the
+timeline are as defined below.
+
+
+### 200 response
+
+This describes the new `gaps` response field being added to the `200 response`
+of `/messages`:
+
+Name | Type | Description | required
+--- | --- | --- | ---
+`gaps` | `[GapEntry]` | A list of gaps indicating where events are missing in the `chunk` | no
+
+
+#### `GapEntry`
+
+key | type | value | description | required
+--- | --- | --- | --- | ---
+`event_id` | string | Event ID | The event ID indicating the position in the `/messages` `chunk` response  | yes
+`prev_pagination_token` | string | Pagination token | A pagination token that represents the spot in the DAG before the given `event_id` in the `chunk`. Omitting this field just means there is no gap on this side. | no
+`next_pagination_token` | string | Pagination token | A pagination token that represents the spot in the DAG after the given `event_id` in the `chunk`. Omitting this field just means there is no gap on this side. | no
+
+
+### `/messages` response examples
+
+The following mermaid diagram represents the room DAG snapshot used for the following
+`/messages` responses. The slightly transparent events with no background are events
+that the homeserver does not have and are in the gap.
+
+Pagination tokens are positions between events. This already an established concept but
+to illustrate this better, see the following `tX` pagination tokens in the following
+diagram.
+
+```mermaid
+flowchart RL
+    after[newest events...]:::gap-event -->|t10| fred -->|t9| waldo:::gap-event -->|t8| garply -->|t7| grault:::gap-event -->|t6| corge -->|t5| qux:::gap-event -->|t4| baz -->|t3| bar:::gap-event -->|t2| foo -->|t1| before[oldest events...]:::gap-event
+
+    classDef gap-event opacity:0.8,fill:transparent;
+```
+
+The idea is to be able to keep paginating from
+`prev_pagination_token`/`next_pagination_token` in the respective direction to fill in
+the gap.
+
+
+#### `/messages?dir=b`
+
+`/messages?dir=b` response example with gaps (`chunk` has events in
+reverse-chronoligcal order since we're paginating backwards):
+
+`/messages?dir=b&from=t6`
+```json5
+{
+  "chunk": [
+    {
+      "event_id": "$corge",
+      "type": "m.room.message",
+      "content": {
+        "body": "corge",
+      }
+    },
+    {
+      "event_id": "$baz",
+      "type": "m.room.message",
+      "content": {
+        "body": "baz",
+      }
+    },
+    {
+      "event_id": "$foo",
+      "type": "m.room.message",
+      "content": {
+        "body": "foo",
+      }
+    }
+  ]
+  "gaps": [
+        {
+          "prev_pagination_token": "t6",
+          "event_id": "$corge",
+          "next_pagination_token": "t5",
+        },
+        {
+          "prev_pagination_token": "t4",
+          "event_id": "$baz",
+          "next_pagination_token": "t3",
+        },
+        {
+          "prev_pagination_token": "t2",
+          "event_id": "$foo",
+          "next_pagination_token": "t1",
+        }
+  ]
+}
+```
+
+
+#### `/messages?dir=f`
+
+`/messages?dir=f` response example with gaps (`chunk` has events in
+chronoligcal order since we're paginating forwards):
+
+`/messages?dir=f&from=t6`
+```json5
+{
+  "chunk": [
+    {
+      "event_id": "$garply",
+      "type": "m.room.message",
+      "content": {
+        "body": "garply",
+      }
+    },
+    {
+      "event_id": "$fred",
+      "type": "m.room.message",
+      "content": {
+        "body": "fred",
+      }
+    },
+  ]
+  "gaps": [
+        {
+          "prev_pagination_token": "t7",
+          "event_id": "$garply",
+          "next_pagination_token": "t8",
+        },
+        {
+          "prev_pagination_token": "t9",
+          "event_id": "$fred",
+          "next_pagination_token": "t10",
+        }
+  ]
+}
+```
+
+
+
+## Potential issues
+
+Lots of gaps/extremities are generated when a spam attack occurs and federation
+falls behind. If clients start showing gaps with retry links, we might just be
+exposing the spam more.
+
+
+## Alternatives
+
+As an alternative, we can continue to do nothing as we do today and not worry
+about the occasional missing events. People seem not to notice any missing
+messages anyway but they do probably see our slow `/messages` pagination.
+
+
+### Expose `prev_events` to the client
+
+One alternative is including the `prev_events` in the events that the client sees so
+they can figure out the DAG chain themselves and see if there is an missing event in the
+middle.
+
+There is an [unspecced `/messages?raw=true` query parameter in
+Synapse](https://github.com/matrix-org/synapse/blob/20c76cecb9eb84dadfa7b2d25b436d3ab9218a1a/synapse/rest/client/room.py#L653)
+that returns the full raw event as seen over federation which means it will include the
+`prev_events`.
+
+You can also specify `event_format: federation` directly in that JSON `filter` parameter
+of `/messages` ->
+`/_matrix/client/v3/rooms/{room_id}}/messages?dir=b&filter=%7B%22event_format%22%3A%20%22federation%22%7D`
+
+Related to:
+
+ - https://github.com/matrix-org/matrix-spec/issues/859
+ - https://github.com/matrix-org/matrix-spec/issues/1047
+
+
+### Synthetic `m.timeline.gap` event alternative
+
+Another alternative is using synthetic events (thing that looks like an event
+without an `event_id`) which the server inserts alongside other events in the
+`chunk` to indicate where the gap is. But this has detractors since it's harder
+to implement in strongly typed SDK's and easy for a client to naively display
+every "event" in the `chunk`.
+
+`/messages` response example with a gap:
+
+```json
+{
+  "chunk": [
+    {
+      "type": "m.room.message",
+      "content": {
+        "body": "foo",
+      }
+    },
+    {
+      "type": "m.timeline.gap",
+      "content": {
+        "gap_start_event_id": "$12345",
+        "pagination_token": "t47409-4357353_219380_26003_2265",
+      }
+    },
+    {
+      "type": "m.room.message",
+      "content": {
+        "body": "baz",
+      }
+    },
+  ]
+}
+```
+
+
+### `GapEntry` alternative only indicating a gap `next_to_event_id` (only one side)
+
+Same concept as the existing `GapEntry` proposal but we only indicate the gap on one
+side of an event `next_to_event_id` according to the direction that `/messages` is going
+already.
+
+The problem with this alternative is that clients store events differently and it's
+valid to want to paginate in either direction from a given event. This alternative works
+fine in the Element Web case where you always paginate backwards in the scrollback and
+store events as a whole timeline list but another client like the [Trixinity
+SDK](https://github.com/benkuly/trixnity), where events are stored individually in a
+linked list, where each event could have a gap before and after, and where a gap could
+be 100's, 1000's of events wide, it would be useful to paginate from both ends to fill
+the gap faster.
+
+<details>
+<summary>
+  Details for the <code>GapEntry</code> alternative only indicating a gap <code>next_to_event_id</code>
+</summary>
+
+#### `GapEntry`
+
+key | type | value | description | required
+--- | --- | --- | --- | ---
+`next_to_event_id` | string | Event ID | The event ID indicating the position in the `/messages` `"chunk"` response where the gap starts after that position. This field can be `null` or completely omitted to indicate that the gap is at the start of the `/messages` `"chunk"` | no
+`pagination_token` | string | Pagination token | A pagination token that represents the spot in the DAG to be able to continue paginating in the same direction as the request and fill in the gap from `next_to_event_id` to the next known event. | yes
+
+
+### `/messages` response examples
+
+The following mermaid diagram represents the room DAG snapshot used for the following
+`/messages` responses. The slightly transparent events with no background are events
+that the homeserver does not have and are in the gap.
+
+Pagination tokens are positions between events. This already an established concept but
+to illustrate this better, see the following `tX` pagination tokens in the following
+diagram.
+
+```mermaid
+flowchart RL
+    after[newest events...]:::gap-event -->|t10| fred -->|t9| waldo:::gap-event -->|t8| garply -->|t7| grault:::gap-event -->|t6| corge -->|t5| qux:::gap-event -->|t4| baz -->|t3| bar:::gap-event -->|t2| foo -->|t1| before[oldest events...]:::gap-event
+
+    classDef gap-event opacity:0.8,fill:transparent;
+```
+
+The idea is to be able to keep paginating from `pagination_token` in the same
+direction of the request to fill in the gap.
+
+
+#### `/messages?dir=b`
+
+`/messages?dir=b` response example with gaps (`chunk` has events in
+reverse-chronoligcal order since we're paginating backwards):
+
+`/messages?dir=b&from=t6`
+```json5
+{
+  "chunk": [
+    // there is no gap from `t6` to `$corge` as expected
+    {
+      "event_id": "$corge",
+      "type": "m.room.message",
+      "content": {
+        "body": "corge",
+      }
+    },
+    // <the first `GapEntry` indicates a gap here>
+    {
+      "event_id": "$baz",
+      "type": "m.room.message",
+      "content": {
+        "body": "baz",
+      }
+    },
+    // <the second `GapEntry` indicates a gap here>
+    {
+      "event_id": "$foo",
+      "type": "m.room.message",
+      "content": {
+        "body": "foo",
+      }
+    }
+    // <the third `GapEntry` indicates a gap here>
+  ]
+  "gaps": [
+        {
+          "next_to_event_id": "$corge",
+          "pagination_token": "t5",
+        },
+        {
+          "next_to_event_id": "$baz",
+          "pagination_token": "t3",
+        },
+        {
+          "next_to_event_id": "$foo",
+          "pagination_token": "t1",
+        }
+  ]
+}
+```
+
+
+#### `/messages?dir=f`
+
+`/messages?dir=f` response example with gaps (`chunk` has events in
+chronoligcal order since we're paginating forwards):
+
+`/messages?dir=f&from=t6`
+```json5
+{
+  "chunk": [
+    // <the first `GapEntry` indicates a gap here>
+    {
+      "event_id": "$garply",
+      "type": "m.room.message",
+      "content": {
+        "body": "garply",
+      }
+    },
+    // <the second `GapEntry` indicates a gap here>
+    {
+      "event_id": "$fred",
+      "type": "m.room.message",
+      "content": {
+        "body": "fred",
+      }
+    },
+    // <the third`GapEntry` indicates a gap here>
+  ]
+  "gaps": [
+        {
+          "next_to_event_id": null,
+          "pagination_token": "t6",
+        },
+        {
+          "next_to_event_id": "$garply",
+          "pagination_token": "t8",
+        },
+        {
+          "next_to_event_id": "$fred",
+          "pagination_token": "t10",
+        }
+  ]
+}
+```
+
+</details>
+
+
+## Future considerations
+
+In the future, we should consider adding the same `gaps` field to `/context` because
+it's another endpoint that returns a linearized version of the DAG.
+
+It could make sense to roll this into this MSC but it might make the proposal less clear
+if we have to bulk it up by specifying the same details for `/context`. Leaving it to be
+follow-up MSC for now.
+
+
+## Security considerations
+
+Only your own homeserver controls whether a gap is added to the `/messages`
+response so there shouldn't be any weird edge case where someone else can
+control whether you to fetch something.
+
+
+## Unstable prefix
+
+While this feature is in development, the `gaps` field can be used as
+`org.matrix.msc3871.gaps`
+
+### While the MSC is unstable
+
+During this period, to detect server support clients should check for the
+presence of the `org.matrix.msc3871` flag in `unstable_features` on `/versions`.
+Clients are also required to use the unstable prefixes (see [unstable
+prefix](#unstable-prefix)) during this time.
+
+### Once the MSC is merged but not in a spec version
+
+Once this MSC is merged, but is not yet part of the spec, clients should rely on
+the presence of the `org.matrix.msc3871.stable` flag in `unstable_features` to
+determine server support. If the flag is present, clients are required to use
+stable prefixes (see [unstable prefix](#unstable-prefix)).
+
+### Once the MSC is in a spec version
+
+Once this MSC becomes a part of a spec version, clients should rely on the
+presence of the spec version, that supports the MSC, in `versions` on
+`/versions`, to determine support. Servers are encouraged to keep the
+`org.matrix.msc3871.stable` flag around for a reasonable amount of time
+to help smooth over the transition for clients. "Reasonable" is intentionally
+left as an implementation detail, however the MSC process currently recommends
+*at most* 2 months from the date of spec release.