You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
matrix-spec-proposals/proposals/3575-sync.md

89 KiB

MSC3575: Sliding Sync (aka Sync v3)

This MSC outlines a replacement for the CS API endpoint /sync.

The current /sync endpoint scales badly as the number of rooms on an account increases. It scales badly because all rooms are returned to the client, and clients cannot opt-out of a large amount of extraneous data such as receipts. On large accounts with thousands of rooms, the initial sync operation can take minutes to perform. This significantly delays the initial login to Matrix clients, and also makes incremental sync very heavy when resuming after any significant pause in usage.

Goals

Any improved /sync mechanism had a number of goals:

  • Sync time should be independent of the number of rooms you are in.
  • Time from launch to confident usability should be as low as possible.
  • Time from login on existing accounts to usability should be as low as possible.
  • Bandwidth should be minimised.
  • Support lazy-loading of things like read receipts (and avoid sending unnecessary data to the client)
  • Support informing the client when room state changes from under it, due to state resolution.
  • Clients should be able to work correctly without ever syncing in the full set of rooms theyre in.
  • Dont incremental sync rooms you dont care about.
  • Combining uploaded filters with ad-hoc filter parameters (which isnt possible with sync v2 today)
  • Servers should not need to store all past since tokens. If a since token has been discarded we should gracefully degrade to initial sync.
  • Ability to filter by space.
  • Ability to filter by room name.

These goals shaped the design of this proposal.

Proposal

At a high level, the proposal introduces a way for clients to filter and sort the rooms they are joined to and then request a subset of the resulting list of rooms rather than the entire room list.

         All joined rooms on user's account
Q W E R T Y U I O P L K J H G F D S A Z X C V B N M
\                                                 /
 \                                               /
  \      Subset of rooms matched by filters     /
   Q W E R T Y U I O P L K J H G F D S A Z X C V
                       |
   A C D E F G H I J K L O P Q R S T U V W X Y Z     Rooms sorted by name (or by recency, etc)
   |_______|
       |

   A C D E F                                         first 5 rooms requested

It also introduces a number of new concepts which are explained in more detail later on:

  • Core API: The minimal API to be sliding sync compatible.
  • Extensions: Additional APIs which expose more data from the server e.g presence, device messages or additional sort/filter operations.

Core

A complete sync request looks like: POST /_matrix/client/unstable/org.matrix.msc3575/sync?pos=4&timeout=30000:

{
  // Optional: allows clients to know what request params reached the server,
  // functionally similar to txn IDs on /send for events.
  "txn_id": "client-chosen-string",

  // Optional: a delta token to remember information between connections.
  // See "Bandwidth optimisations for persistent clients" for more information.
  "delta_token": "opaque-server-provided-string",

  // Optional: a unique string to identify this connection to the server. If this
  // is missing, only 1 sliding sync connection can be made to the server at any one time.
  // Clients need to set this to allow >1 connection concurrently, so the server can distinguish
  // between connections. This is NOT STICKY and must be provided with every request, if your client
  // needs >1 concurrent connection. Max: 16 chars, due to it being required with every request.
  "conn_id": "client-chosen",

  // Sliding Window API
  "lists": {
    "client_chosen_key": {
      "ranges": [ [0,99] ],
      "sort": [ "by_notification_level", "by_recency", "by_name" ],
      "required_state": [
        ["m.room.join_rules", ""],
        ["m.room.history_visibility", ""],
        ["m.space.child", "*"]
      ],
      "timeline_limit": 10,
      "filters": {
        "is_dm": true
      },
      "bump_event_types": [ "m.room.message", "m.room.encrypted" ],
    }
  },

  // Room Subscriptions API
  "room_subscriptions": {
      "!sub1:bar": {
          "required_state": [ ["*","*"] ],
          "timeline_limit": 50,
          "include_old_rooms": {
              "timeline_limit": 1,
              "required_state": [ ["m.room.tombstone", ""], ["m.room.create", ""] ],
          }
      }
  },
  "unsubscribe_rooms": [ "!sub3:bar" ],

  // Extensions API
  "extensions": {}
}

An entire response looks like: HTTP 200 OK

{
  // Connection and Streaming API
  "pos": "5",
  "txn_id": "client-chosen-string", // echo of the txn ID

  // Sliding Window API
  "lists": {
    "client_chosen_key": {
      "count": 1337,
      "ops": [
        {
          "op": "SYNC",
          "range": [0, 99],
          "room_ids": [
            "!foo:bar", // ... 99 more room IDs
          ]
        }
      ]
    }
  },

  // Aggregated rooms from lists and room subscriptions
  "rooms": {
    // Room from room subscription
    "!sub1:bar": {
        "name": "Alice and Bob",
        "avatar": "mxc://...",
        "initial": true,
        "required_state": [
          {"sender":"@alice:example.com","type":"m.room.create", "state_key":"", "content":{"creator":"@alice:example.com"}},
          {"sender":"@alice:example.com","type":"m.room.join_rules", "state_key":"", "content":{"join_rule":"invite"}},
          {"sender":"@alice:example.com","type":"m.room.history_visibility", "state_key":"", "content":{"history_visibility":"joined"}},
          {"sender":"@alice:example.com","type":"m.room.member", "state_key":"@alice:example.com", "content":{"membership":"join"}}
        ],
        "timeline": [
          {"sender":"@alice:example.com","type":"m.room.create", "state_key":"", "content":{"creator":"@alice:example.com"}},
          {"sender":"@alice:example.com","type":"m.room.join_rules", "state_key":"", "content":{"join_rule":"invite"}},
          {"sender":"@alice:example.com","type":"m.room.history_visibility", "state_key":"", "content":{"history_visibility":"joined"}},
          {"sender":"@alice:example.com","type":"m.room.member", "state_key":"@alice:example.com", "content":{"membership":"join"}},
          {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"A"}},
          {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"B"}},
        ],
        "prev_batch": "t111_222_333",
        "joined_count": 41,
        "invited_count": 1,
        "notification_count": 1,
        "highlight_count": 0
    },
    // rooms from list
    "!foo:bar": {
      "name": "The calculated room name",
      "avatar": "mxc://...",
      "initial": true,
      "required_state": [
        {"sender":"@alice:example.com","type":"m.room.join_rules", "state_key":"", "content":{"join_rule":"invite"}},
        {"sender":"@alice:example.com","type":"m.room.history_visibility", "state_key":"", "content":{"history_visibility":"joined"}},
        {"sender":"@alice:example.com","type":"m.space.child", "state_key":"!foo:example.com", "content":{"via":["example.com"]}},
        {"sender":"@alice:example.com","type":"m.space.child", "state_key":"!bar:example.com", "content":{"via":["example.com"]}},
        {"sender":"@alice:example.com","type":"m.space.child", "state_key":"!baz:example.com", "content":{"via":["example.com"]}}
      ],
      "timeline": [
        {"sender":"@alice:example.com","type":"m.room.join_rules", "state_key":"", "content":{"join_rule":"invite"}},
        {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"A"}},
        {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"B"}},
        {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"C"}},
        {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"D"}},
      ],
      "prev_batch": "t111_222_333",
      "joined_count": 4,
      "invited_count": 0,
      "notification_count": 54,
      "highlight_count": 3
    },
    // ... 99 more items
  },

  // Extensions API
  "extensions": {},

  // Bandwidth optimisations, see "Bandwidth optimisations for persistent clients"
  "delta_token": "server-generated-string" // the delta token to use
}

These fields and their interactions are explained in the next few sections. This forms the core of the API. Additional data can be returned via "extensions".

Connections

At a high level, the syncing mechanism creates a "connection" to the server to allow the bi-directional exchange of JSON objects. This mechanism is ideally suited for WebSockets, but more difficult to do for HTTP long-polling. This design was chosen in order to allow for a seamless transition to a stream-orientated protocol like WebSockets in the future.

The existing /sync implementation in Matrix also creates a stream but it has limitations. It uses a since token to tell clients where in the stream they are and to tell servers which messages the client has received (in other words it serves as an ACK). Critically, the stream is not stateful. The request must contain the entire set of input parameters, either via a filter ID or in-line filter. This results in clients using the same set of input parameters most of the time. In order for sliding sync to provide only the data needed to render the UI and nothing more, the set of input parameters needs to be greatly expanded and they need to be dynamic: adding and removing parameters on-the-fly, without additional round trips. In order to achieve this, Sliding Sync creates stateful connections to the server, so clients can simply send the deltas. This means clients and servers need to have a mechanism to agree on what that stored state is. This introduces additional rules on client implementations.

In a WebSockets implementation, this is easy: the request parameters are sent initially when a connection is established and then remain active for the lifetime of the connection. Any changes to these parameters are reliably sent to the server in the order they were submitted. In order for clients to know when these parameters have been applied, most application-level WebSocket protocols use a "message ID" chosen by the client which is then echoed back in the ACK message. This is very similar to transaction IDs on the /send endpoint in Matrix.

However, this proposal does not use WebSockets; it uses HTTP long-polling. Like with /sync, this proposal uses a token to allow servers to know which messages the client has received. Emulating WebSockets over HTTP long-polling is difficult and has limitations. Servers cannot push new data to the client and must instead wait for the client to make an HTTP request. In addition, individual HTTP requests can fail, resulting in ordering problems which simply do not exist in a WebSockets implementation. This can lead to some counter-intuitive responses from a Sliding Sync enabled server, unless certain rules are followed.

Long-polling Rule 1: do not send multiple concurrent sliding sync requests to the server with the same connection ID. If a request is lost in transit, it can be impossible to know if it has been applied on the server or not. This is not an issue for /sync because the request is stateless; there's nothing to lose in the event of packet loss. In this example, A is applied on both sides, B is not applied on either side, and C is applied on one side only, which then gets returned in the next successful response by using the position of the client request. The numbers reflect the position in the stream (similar to a since token):

State  Client               Server   State   position
         | --------A,0------> |                 0
         |                    |        A        1
         | <------OK,1------- |        A        1
  A,1    |                    |        A        1
  A,1    | --------B,1--/     |        A        1
  A,1    | --------C,1------> |        A        1
         |                    |        A,C      2
  A,1    |         /--------- |        A,C      2
         |                    |        A,C      2
  A,1    | --------D,1------> | 1 != 2 -> missed a response
         |                    |        A,C,D    3
A,C,D    | <------C,D,3------ |        A,C,D    3

At this point, the client knows that B never made it to the server, because C was sent after B, and the server has ACKed C. If requests were sent in parallel (B and C at the same time), it would be impossible for clients to know if B was still processing or if B had failed entirely.

Long-polling Rule 2: use transaction IDs if you need to know when a response has been applied. The above example used A,B,C,D as transaction IDs, but in reality requests/responses are not always obviously tied together. For example, requesting the first 10 rooms on a users account may return 0 results or 10: it's not possible to know ahead of time. Clients need to know this information to know when to stop showing a spinner for example. For these reasons, clients SHOULD send a transaction ID when they need to know when the response has been calculated.

Long-polling Rule 3: the HTTP response you receive may not match the HTTP request you sent. In the above example, C,D were sent in the same response. In practice, the server does not combine multiple responses into a single response. Instead, it will send the most recent unacknowledged response, in this case C, even though the HTTP request was for D.

Expiry

Connections can be "expired" by the server at any time and for any reason. When a connection is expired, the server will send back an HTTP 400 containing the response body:

{
  "error": "Unknown position",
  "errcode": "M_UNKNOWN_POS"
}

Common reasons for expiring a connection include:

  • The last request was sent too long ago.
  • The server has reached a memory limit for your connection and has expired it to reclaim memory.
  • The server which handled your last request is no longer running (e.g it was restarted) and it cannot calculate a response.

To handle expired connections, clients should send an initial request (with all sticky request parameters) without a pos value to restart the connection.

Concurrent connections

There are three main reasons why a client may want to have >1 connection to the server open concurrently:

  • The client is a browser, and it should be possible to open the same client in multiple tabs without causing problems. Without concurrent connections, each tab would reset the other tabs connection due to different ?pos= values being sent. The number of concurrent connections is technically unbounded.
  • The client is a mobile application, and it should be possible to have a "push process" connection in addition to the "app connection". Without concurrent connections, it isn't possible to obtain to-device messages in the push process, whilst also obtaining them in the main app. The number of concurrent connections is fixed e.g 2.
  • The client wants to do a one-shot request for some data, without incurring latency/bandwidth penalties with all the activity on the user's account. Without concurrent connections, it isn't possible to get the response without also potentially getting large amounts of extraneous data. The number of concurrent connections is N+1, where N is the number of active concurrent connections.

Each distinct connection MUST specify a unique conn_id at the top-level of every sync request, consistent for that connection for that device. For example:

  • Each browser tab needs a distinct connection. Each tab uses a unix timestamp when the page was loaded and uses that throughout the tab's lifetime e.g conn_id: "1683726382973"
  • Process A and B each need a distinct connection. Process A uses conn_id: "A" and Process B uses conn_id: "B".
  • For unbounded one-shot connections controlled by a single process, a simple monotonically increasing integer can be used as the connection ID e.g conn_id: "4". It is also possible to re-use one-shot connections by omitting the ?pos= value, as that will trigger an initial sync.

Using concurrent connections may result in data loss if used inappropriately. This can happen when one connection sees some data and then performs some action to delete that data on the server before other concurrent connections have seen this data. Where this is a risk, it will be outlined clearly under a "concurrent connections" subheading. This is particularly important for certain extensions like the to-device and E2EE extensions, which delete data when the client has acknowledged the previous response.

Message IDs for clients and servers

For the long-polling use case, this proposal includes an opaque token that is very similar to /sync v2's since query parameter. This is called pos and represents the position in the stream the client is currently at. Unlike /sync v2, this token is ephemeral and can be invalidated at any time. When a client first connects to the server, no pos is specified. Also unlike /sync v2, this token cannot be used with other APIs such as /messages or /keys/changes. Note that the "connection" formed to the server is not a long-lived TCP connection, it is just an application-level concept of a connection.

In simple servers, the pos may be an incrementing integer, but more complex servers may use vector clocks or contain node identifying information in the token. Clients MUST treat pos as an opaque value and not introspect it.

The timeout query parameter exists for the same purposes of sync v2: to tell the server how many milliseconds to hold open the connection before returning.

In addition, clients may send txn_id field at the top-level JSON object in the request to serve as a client message ID. Servers MUST echo this back to the client via the txn_id field in the top-level JSON object in the response when this request has been processed.

Sticky request parameters

Request parameters can be "sticky". This means that their value is remembered across multiple requests. Clients cannot choose which parameters are sticky, the API defines which parameters are sticky. The lifetime of sticky request parameters are tied to a sync connection. When the connection is lost, the request parameters are lost with it. This feature exists to allow clients to configure the sync stream in a bandwidth-efficient way. For example, if all keys were sticky:

Client                         Server
  | ------{ "foo": "bar" }------> |  {"foo":"bar"}
  | <-------HTTP 200 OK---------- |
  | ------{ "baz": "quuz" }-----> | {"foo":"bar","baz":"quuz"}
  | <-------HTTP 200 OK---------- |

For complex nested data, APIs which include sticky parameters MUST indicate every sticky field to avoid ambiguity. For example, an ambiguous API may state the following:

{
    "foo": { // sticky
        "bar": 1,
        "baz": 2
    }
}

When this object is combined with the additional object:

{
    "foo": {
        "bar": 3
    }
}

What is the value of baz? Both unset and 2 are valid answers. For this reason, baz MUST be marked as sticky if the desired result is 2, else it will be unset.

In order for servers and clients to agree on the set of sticky parameters, clients MUST send a transaction ID with each change to their request parameters and servers MUST buffer responses. This transaction ID will be echoed back to the client so it knows that those parameters have been applied. If the request parameters have not been modified, then the txn_id does not need to be sent.

Room List parameters

One or more room lists can be requested in sliding sync like so:

{
  // A map of list key to list information. Max lists: 100.
  "lists": {
    // an arbitrary string which the client is using to refer to this list for this connection. Keep
    // this small as it needs to be sent a lot. Max length: 64 bytes.
    "client_chosen_key": {
      // Sliding window ranges, see the Sliding Window API for more information.
      // If this field is missing, no sliding window is used and all rooms are returned in this list.
      "ranges": [ [0,99] ],
      // Sticky. List sort order. See Sliding Window API for more information.
      // These fields may be expanded through use of extensions.
      "sort": [ "by_notification_level", "by_recency" ],
      // Sticky. Required state for each room returned. An array of event type and state key tuples.
      // Note that elements of this array are NOT sticky so they must be specified in full when they
      // are changed. Elements in this array are ORd together to produce the final set of state events
      // to return. One unique exception is when you request all state events via ["*", "*"]. When used,
      // all state events are returned by default, and additional entries FILTER OUT the returned set
      // of state events. These additional entries cannot use '*' themselves.
      // For example, ["*", "*"], ["m.room.member", "@alice:example.com"] will _exclude_ every m.room.member
      // event _except_ for @alice:example.com, and include every other state event.
      // In addition, ["*", "*"], ["m.space.child", "*"] is an error, the m.space.child filter is not
      // required as it would have been returned anyway.
      "required_state": [
        // Request the join rules event. Note that the empty string is required here to match
        // the event's blank state_key.
        ["m.room.join_rules", ""],
        ["m.room.history_visibility", ""],
        // Request all `m.space.child` state events.
        // The * is a special sentinel value meaning 'all keys'.
        // Note that `*` is NOT a generic glob function. You cannot specify `foo*` to pull in keys
        // like `food` and `foobar`. In this case, the * is treated as a literal *.
        ["m.space.child", "*"],
        // Request only the m.room.member events required to render events in the timeline.
        // The "$LAZY" value is a special sentinel value meaning "lazy loading" and is only valid for
        // the "m.room.member" event type. For more information on the semantics, see "Lazy-Loading Room Members".
        ["m.room.member", "$LAZY"],
        // Request your own m.room.member event.
        // The "$ME" value is a special sentinel value meaning "my user id". It is valid for use on
        // any state event, but is typically most useful on the m.room.member event.
        ["m.room.member", "$ME"],
        // Request all state events.
        ["*", "*"]
      ],
      // Sticky. The maximum number of timeline events to return per response.
      "timeline_limit": 10,

      // See the "Tombstones" section for more information.
      "include_old_rooms": { //sticky
        "timeline_limit": 1,
        "required_state": [ ["m.room.tombstone", ""] ]
      },

      // Sticky. Return a stripped variant of membership events (containing `user_id` and optionally `avatar_url` and `displayname`)
      // for the users used to calculate the room name.
      "include_heroes": true,

      // Sticky. Filters to apply to the list before sorting.
      "filters": {
        // All fields below are Sticky.
        // All fields are applied with AND operators, hence if is_dm:true and is_encrypted:true
        // then only Encrypted DM rooms will be returned. The absence of fields implies no filter
        // on that criteria: it does NOT imply 'false'.
        // These fields may be expanded through use of extensions.

        // Flag which only returns rooms present (or not) in the DM section of account data.
        // If unset, both DM rooms and non-DM rooms are returned. If false, only non-DM rooms
        // are returned. If true, only DM rooms are returned.
        "is_dm": true,
        // A list of spaces which target rooms must be a part of, as m.space.child state events.
        // The server will inspect the m.space.child state events for the JOINED space room IDs given,
        // and filter the room list based on the INVITED/JOINED children room IDs.
        // If the child room has a m.room.tombstone event, then the search should recursively navigate
        // the room ID in that event to find the latest room and use that room ID instead of the initial
        // room ID in the m.space.child event.
        // If unset, all rooms are included. Servers MUST NOT navigate subspaces. It is up to the client to
        // give a complete list of spaces to navigate. Only rooms directly mentioned as m.space.child
        // events in these spaces will be returned. Unknown spaces or spaces the user is not joined to
        // will be ignored.
        "spaces": ["!foo:bar", "!bar:baz"],
        // Flag which only returns rooms which have an `m.room.encryption` state event. If unset,
        // both encrypted and unencrypted rooms are returned. If false, only unencrypted rooms
        // are returned. If true, only encrypted rooms are returned.
        "is_encrypted": true,
        // Flag which only returns rooms the user is currently invited to. If unset, both invited
        // and joined rooms are returned. If false, no invited rooms are returned. If true, only
        // invited rooms are returned.
        "is_invite": true,
        // If specified, only rooms where the `m.room.create` event has a `type` matching one
        // of the strings in this array will be returned. If this field is unset, all rooms are
        // returned regardless of type. This can be used to get the initial set of spaces for an account.
        // For rooms which do not have a room type, use 'null' to include them.
        "room_types": [ "m.space", null ],
        // Same as "room_types" but inverted. This can be used to filter out spaces from the room list.
        // If a type is in both room_types and not_room_types, then not_room_types wins and they are
        // not included in the result.
        "not_room_types": [ "m.space" ],
        // Filter the room name. Case-insensitive partial matching e.g 'foo' matches 'abFooab'.
        // The term 'like' is inspired by SQL 'LIKE', and the text here is similar to '%foo%'.
        "room_name_like": "foo",
        // Filter the room based on its room tags. If multiple tags are present, a room can have
        // any one of the listed tags (OR'd).
        "tags": ["m.favourite"],
        // Filter the room based on its room tags. Takes priority over `tags`. For example, a room
        // with tags A and B with filters tags:[A] not_tags:[B] would NOT be included because not_tags
        // takes priority over `tags`. This filter is useful if your Rooms list does NOT include the
        // list of favourite rooms again.
        "not_tags": ["m.lowpriority"]
      },
       // Sticky. Allowlist of event types which should be considered recent activity
       // when sorting `by_recency`. By omitting event types from this field, clients
       // can ensure that uninteresting events (e.g. a profile rename) do not cause a
       // room to jump to the top of its list(s). Empty or omitted `bump_event_types`
       // have no effect—all events in a room will be considered recent activity.
       //
       // NB. Changes to bump_event_types will NOT cause the room list to be reordered;
       // it will only affect the ordering of rooms due to future updates.
       "bump_event_types": [ "m.room.message", "m.room.encrypted" ],
    }
  },
}

Rationale: There are use cases for clients requesting multiple lists. Many clients have DMs and Invites in dedicated sections separate from the joined room list. API support for this is important to ensure that the initial UI can load quickly. This is why the API allows multiple lists and there are filters for things like DMs, Invites and Spaces. The timeline limit is very similar to Sync v2's room.timeline.limit filter field and is required to ensure that busy rooms don't send vast amounts of events. Wildcard matching on required_state fields are purposefully restricted to avoid clients sending complex matching criteria (e.g pathological regular expressions) and in practice there seems to be very little in-the-wild use of partial key matching like foo* as new state events tend to namespaced by their event type. Fields in required_state are not sticky mainly due to semantics: expressing deletions becomes hard. The inclusion of a dedicated is_encrypted filter exists for the benefit of complex clients: see the E2EE section for more information. The room_name_like field exists to allow the ability to search by room name which most clients support, and is crucial for large accounts. The room_types filters exist primarily to include/exclude spaces. A previous version of this MSC expressed multiple lists as an array and not an object with client-chosen keys. This was changed because using arrays had a few undesirable consequences: you couldn't just edit list #3, you had to add stub lists in index positions 0,1,2 first, and likewise the response demanded stub responses which always included the count field to pad out earlier lists to get to the list index that was modified. In addition, it was unclear how to delete a list. Also, some clients would race on startup to create lists, which would result in different index positions being allocated, which made it hard for client code to then refer deterministically to specific lists. The workaround basically assigned static names to each list which then mapped to an index position. By using an object, these issues disappear.

The server will then return a rooms key which have the following fields:

{
  "rooms": {
    // the room ID
    "!foo:bar": {
      "name": "The calculated room name",
      // Optional, nullable string: the MXC URL of the room's avatar. If omitted,
      // there is no change to the avatar. If null, the room now has no avatar.
      "avatar": "mxc://...",
      // Optional. If omitted there is no change to the heroes or the `name` was not
      // calculated using room heroes. `avatar_url` and `displayname` are optional.
      "heroes": [
        {"user_id":"@alice:example.com","displayname":"Alice","avatar_url":"mxc://..."},
      ],
      // Flag which is set when this is the first time the server is sending this data on this connection.
      // Clients can use this flag to replace or update their local state. When there is an update, servers
      // MUST omit this flag entirely and NOT send "initial":false as this is wasteful on bandwidth. The
      // absence of this flag means 'false'.
      "initial": true,
      // this is the CURRENT STATE, unlike sync v2
      "required_state": [
        {"sender":"@alice:example.com","type":"m.room.join_rules", "state_key":"", "content":{"join_rule":"invite"}},
        {"sender":"@alice:example.com","type":"m.room.history_visibility", "state_key":"", "content":{"history_visibility":"joined"}},
        {"sender":"@alice:example.com","type":"m.space.child", "state_key":"!foo:example.com", "content":{"via":["example.com"]}},
        {"sender":"@alice:example.com","type":"m.space.child", "state_key":"!bar:example.com", "content":{"via":["example.com"]}},
        {"sender":"@alice:example.com","type":"m.space.child", "state_key":"!baz:example.com", "content":{"via":["example.com"]}}
      ],
      // Last event is most recent. Max timeline_limit events.
      "timeline": [
        {"sender":"@alice:example.com","type":"m.room.join_rules", "state_key":"", "content":{"join_rule":"invite"}},
        {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"A"}},
        {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"B"}},
        {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"C"}},
        {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"D"}},
      ],
      "is_dm": true, // field is absent on non-DM rooms
      "invite_state": [ { type: "m.room.member" } ], // stripped state events, same as rooms.invite.$room_id.invite_state in sync v2, absent on joined/left rooms
      "prev_batch": "t111_222_333", // same as sync v2
      "limited": true,              // same as sync v2
      "joined_count": 41,           // same as sync v2 m.joined_member_count
      "invited_count": 1,           // same as sync v2 m.invited_member_count
      "notification_count": 54,     // same as sync v2
      "highlight_count": 3,          // same as sync v2
      // The number of timeline events which have just occurred and are not historical.
      // The last N events are 'live' and should be treated as such.
      // This is mostly useful to determine whether a given @mention event should make a noise or not.
      // Clients cannot rely solely on the absence of 'initial: true' to determine live events because
      // if a room not in the sliding window bumps into the window because of an @mention it will have
      // 'initial: true' yet contain a single live event (with potentially other old events in the timeline)
      "num_live": 1
    }
  }
}

Rationale: The room name and counts are required for display on the UI. They are calculated server side because they are required for sort operations on lists. The joined and invited member counts are included for the client-side calculation of push rules, specifically {"kind":"room_member_count","is":"2"} which would be impossible to calculate without knowing the total number of users in the room. Failure to include this field could cause rooms to notify incorrectly, and they need to be calculated client-side in E2EE rooms. The required_state is controversially the current state which breaks from sync v2 which has the state be "the state before the start of the timeline". Sync v2's rationale was event duplication (state events can appear in both the state section and the timeline section if it's the current state) and the fact that clients would have to rewind state to work out historical display names. Clients who show historical display names already need to rewind state by inspecting the prev_content of an event to display text like "@alice changed their name from Alice to Alice2". Event duplication may be reduced using Event ID -> Event maps in the response should this be a concern. The benefit of returning the current state is that servers can cache the latest state to return the response more quickly. If, instead, servers returned the state at the start of a timeline block, servers are forced to either rewind this state (as clients will need to do) or worse, do an expensive database access to request the state before an event. As clients can be at different points in the stream for a given room, this would force servers to cache every possible room state. It's not practical for servers to cache every single possible earlier state for each room.

Sliding Window API

At a high level, the sliding window API provides a way to synchronise a subslice of a list in a bandwidth efficient way. It does this by referring to "operations" which must be performed on the stored client list, such as INSERT, DELETE and SYNC. Each operation has an index position OR a range of index positions which tells the client where the operation should be performed. The possible operations are:

  • SYNC: Sets a range of entries. Clients SHOULD discard what they previous knew about entries in this range.
  • INSERT: Sets a single entry. If the position is not empty then clients MUST move entries to the left or the right depending on where the closest empty space is.
  • DELETE: Remove a single entry. Often comes before an INSERT to allow entries to move places.
  • INVALIDATE: Remove a range of entries. Clients MAY persist the invalidated range for offline support, but they should be treated as empty when additional operations which concern indexes in the range arrive from the server.

For example:

            Client                         Server
          []  |                              |  0,1,2,3,4,5,6,7,8   index
              |                              | [A,B,C,D,E,F,G,H,I]
              | -------- range[0,4] -------> |
 [A,B,C,D,E]  | <--- SYNC[0,4]=A,B,C,D,E --- |
              |                              |  0,1,2,3,4,5,6,7,8
              |                              | [H,A,B,C,D,E,F,G,I]  H moves to the front
              | ----- wait for updates ----> |
 [H,A,B,C,D]  | <- DELETE[4], INSERT[0]=H--- |
              |                              |  0,1,2,3,4,5,6,7,8
              |                              | [J,K,L,M,N,O,P,Q,R]  Entire list is replaced
              | ----- wait for updates ----> |
 [J,K,L,M,N]  | <----INVALIDATE[0,4]-------- |
              |      SYNC[0,4]=J,K,L,M,N     |
              |                              | [J,K,L,N,O,P,Q,R]    M is deleted
              | ----- wait for updates ----> |
 [J,K,L,N,O]  | <- DELETE[3], INSERT[4]=O--- |

The sliding sync API exposes this API shape via the following request parameters:

{
  // Multiple lists can be requested
  "lists": {
    "list1": {
      // Multiple sliding windows inside a list can be requested. Integers are _inclusive_.
      "ranges": [ [0,9], [20,29] ],
      // How the list should be sorted on the server. The first value is applied first, then tiebreaks
      // are performed with the 2nd sort order, then the 3rd until there are no more sort orders left.
      "sort": [ "by_notification_level", "by_recency", "by_name" ],
      // Additional Room List request parameters omitted as they are
      // unrelated to the semantics of the sliding window, see previous section.
    }
  },
}

Which returns the following response parameters:

{
  // This object echoes back the list keys provided in the request.
  "lists": {
    "list1": {
      // The total number of entries in the list. Always present if this list is.
      "count": 1337,
      // The sliding list operations to perform.
      "ops": [
        {
          // The operation being performed.
          "op": "SYNC",
          // Which index positions are affected by this operation. These are both inclusive.
          "range": [0, 9],
          // Which room IDs are affected by this operation. These IDs match up to the positions
          // in the `range`, so the last room ID in this list matches the 9th index. The room data
          // is held in a separate object.
          "room_ids": [
            "!foo:bar", // ... 9 more room IDs
          ]
        }
      ]
    }
  },
  // The room data to use for each room ID. This data represents the point in time AFTER all
  // ops have been applied. For example, if a room had 2 new events which changed its list position
  // then you could see `ops` with DELETE[4,!foo:bar], INSERT[0,!foo:bar], DELETE[0,!foo:bar], INSERT[1,!foo:bar]
  // then the room !foo:bar in this map MUST contain both events.
  //
  // This map will only contain rooms which are present in the list `ops` above. If there are no
  // `ops` (because there are no `ranges`) then all rooms which match the list filters will be
  // present in this list, unordered. This functionality is useful for clients which do not want
  // to use sliding list semantics. This map is an aggregation of all rooms which can be returned
  // over all lists, including room subscriptions. This means if a room appears in 2 lists, only
  // 1 entry is present.
  "rooms": {
    "!foo:bar": {
      "name": "The calculated room name",
      // Additional response parameters omitted as they are
      // unrelated to the semantics of the sliding window.
      // See previous section on room list parameters.
    },
    // ... 9 more items
  },
}

Rationale: Prior versions of this MSC more tightly coupled room data and list operations. This became a problem if you did not want to use sliding windows because the room data will be contained within list operations you don't care about. Now that this data is split out, it is easy for clients to opt-out of sliding window semantics entirely (the ops key just disappears). Furthermore, the rooms map was originally split out to be per-list / per-room-subscription but this could cause needless duplication if a room appeared in >1 list. Each list can have different parameters associated with them (e.g required_state, timeline_limit) but these can be aggregated / UNION'd easily.

The possible sort operations are:

  • by_recency: Sort by origin_server_ts on the most recently received event in the room. Note that due to clock drift over federation it is possible for rooms to re-order such that the most recently received event in the entire list does not cause that room to go to index position 0. The highest origin_server_ts value comes first in the list.
  • by_notification_level: Sort based on the presence of non-zero values for highlight_count and notification_count. Rooms with a highlight_count > 0 come first, followed by rooms with a notification_count > 0 which are encrypted, followed by unencrypted rooms with a notification_count > 0, followed by all other rooms. See the "E2EE Handling" section for more information. Rooms are not sorted within each level: use an additional sort operation like by_recency to sort these groups. TODO: should we include unread indicator with this?
  • by_name: Sort by room name lexicographically. This requires servers to implement the room name calculation algorithm. The server MUST perform the following steps:
    • Calculate the room name from this user's perspective. This may vary depending on the user as DM rooms will have the room name set to the name of the other user. This is the value that will be returned in the name field for the room but is NOT the value that the server should perform sort operations on. See following steps.
    • Remove any of the following characters from the beginning/end of the calculated name: #!()):_@. This ensures things like canonical aliases display in roughly the right alphabetical locations rather than all together with all rooms that start with #.
    • Lower-case the result by unicode. This ensures Matrix and matrix sort in the same locations.
    • Perform sort operations on this 'canonicalised' name. For clarity, the sort is descending so A comes before B.

Sorting algorithms MUST be stable and deterministic to avoid needless churn as otherwise identical rooms keep swapping positions. This can easily be achieved by including a final tiebreak based on the room ID (e.g lexicographical sort on the room ID) to guarantee stability and determinism. It is currently not possible to invert the sort order (ASC vs DESC). This may be added to this MSC if there is a community need for it.

NOTE: It is known that by forcing servers to calculate the room name there can be problems concerning multiple languages. "Alice and Bob" in English vs "Alice et Bob" in French for example, which may affect sort ordering. This can be mitigated by adding a lang sticky request parameter to control how i18n and l10n are done.

Rationale: The sort operations are restrictive and limited in scope on purpose. Alternatives such as arbitrary or more expansive sort orders were decided against as it would A) force servers to support nonsensical and potentially expensive operations and B) not produce the best sort order for specific use cases in Matrix such as alias handling. That being said, having some mechanism to support additional sort operations is useful, see the extensions section for more information.

The complete API shape for each operation is shown below (note the key names vary on the operation):

{
  "op": "DELETE",
  "index": 8
}

{
  "op": "INSERT",
  "index": 99,
  "room_id": "!foo:bar"
}

{
  "op": "INVALIDATE",
  "range": [100,199]
}

{
  "range": [100,117],
  "op": "SYNC",
  "room_ids": [
    // ... 18 room IDs
  ]
}

Note that clients will NOT be notified of any events or activity in rooms not in the sliding window. This can be a problem for some use cases:

  • Following a permalink to a random room which is not in the window should be possible.
  • Receiving a direct @mention in a room not in the window should notify the client.

For the first of these issues, the sliding sync API exposes a "room subscription" API. For the second issue, the sliding sync API exposes a "notifications" API.

Requesting all rooms

Sometimes clients may not wish to deal with sliding windows, and instead get all rooms on the user's account. For example, if your client is a bot or an application service, having sliding windows just adds extra complexity. To aid these use cases, any list can omit the ranges key and add a new sticky key at the same level: slow_get_all_rooms: true. If this is set, the ranges and sort keys are ignored and all rooms which match the list filters will be returned. If there are no filters for this list, then all rooms on the user's account will be returned. This gives additional flexibility as it allows clients to request all E2EE rooms in a separate list from the sliding windows. When operating in this mode, there will be no movement operations (DELETE followed by INSERT) as the client has the entire list and can work out whatever sort order they wish. There will still be DELETE and INSERT operations when rooms are left or joined respectively. In addition, there will be an initial SYNC operation to let the client know which rooms in the rooms object were from this list.

An example request:

{
  "lists": {
    // list will include all encrypted rooms in one go
    "list_all_encrypted": {
      "slow_get_all_rooms": true,
      "filters": {
        "is_encrypted": true
      }
    },
    // list will include the first 20 unencrypted rooms sorted accordingly
    "list_unencrypted": {
      "ranges": [ [0,19] ],
      "sort": [ "by_notification_level", "by_recency" ],
      "filters": {
        "is_encrypted": false
      }
    }
  },
}

Would return the response:

{
  "lists": {
    "list_all_encrypted": {
      "count": 1337,
      "ops": [
        {
          "op": "SYNC",
          "range": [0, 1336],
          "room_ids": [
            "!encrypted:bar", // ... 1336 more room IDs
          ]
        }
      ]
    },
    "list_unencrypted": {
      "count": 420,
      "ops": [
        {
          "op": "SYNC",
          "range": [0, 19],
          "room_ids": [
            "!unencrypted:bar", // ... 19 more room IDs
          ]
        }
      ]
    }
  },
  "rooms": {
    "!encrypted:bar": {
      // ...
    },
    // ... 1336 more items
    "!unencrypted:bar": {
      // ...
    },
    // ... 19 more items
  },
}

Room Subscription API

Sometimes clients know exactly which room they want to get information about e.g by following a permalink or by refreshing a webapp currently viewing a specific room. The sliding window API alone is insufficient for this use case because there's no way to say "please track this room explicitly". The room subscription API serves as a way to provide this tracking. At a high level, the client provides a map of room ID to room list parameters and the server then returns the response in the same format as the sliding window API, just without the operations/indexes.

To track a room !sub1:bar, the client would send the following request:

{
  "room_subscriptions": { // sticky
      "!sub1:bar": { // sticky
          "required_state": [ ["*","*"] ],
          "timeline_limit": 50,
          "include_old_rooms": { // See the "Tombstones" section for more information.
            "timeline_limit": 1,
            "required_state": [ ["m.room.tombstone", ""] ]
          },
      }
  }
}

This would return the following response:

{
  "rooms": {
    "!sub1:bar": {
        "name": "Alice and Bob",
        "required_state": [
          {"sender":"@alice:example.com","type":"m.room.create", "state_key":"", "content":{"creator":"@alice:example.com"}},
          {"sender":"@alice:example.com","type":"m.room.join_rules", "state_key":"", "content":{"join_rule":"invite"}},
          {"sender":"@alice:example.com","type":"m.room.history_visibility", "state_key":"", "content":{"history_visibility":"joined"}},
          {"sender":"@alice:example.com","type":"m.room.member", "state_key":"@alice:example.com", "content":{"membership":"join","displayname":"Alice"}},
          {"sender":"@alice:example.com","type":"m.room.member", "state_key":"@bob:example.com", "content":{"membership":"join","displayname":"Bob"}}
        ],
        "timeline": [
          {"sender":"@alice:example.com","type":"m.room.create", "state_key":"", "content":{"creator":"@alice:example.com"}},
          {"sender":"@alice:example.com","type":"m.room.join_rules", "state_key":"", "content":{"join_rule":"invite"}},
          {"sender":"@alice:example.com","type":"m.room.history_visibility", "state_key":"", "content":{"history_visibility":"joined"}},
          {"sender":"@alice:example.com","type":"m.room.member", "state_key":"@alice:example.com", "content":{"membership":"join","displayname":"Alice"}},
          {"sender":"@alice:example.com","type":"m.room.member", "state_key":"@bob:example.com", "content":{"membership":"join","displayname":"Bob"}},
          {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"A"}},
          {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"B"}},
        ],
        "limited": true,
        // ...
    }
  }
}

Any updates in this room would be returned in the same section of the sync response:

{
  "rooms": {
    "!sub1:bar": {
        "timeline": [
          {"sender":"@alice:example.com","type":"m.room.message", "content":{"body":"C"}},
        ]
    }
  }
}

Multiple rooms can be subscribed to by specifying additional keys in the room subscription map. If a room is subscribed to multiple times, the most recent subscription takes effect for the purposes of required_state and timeline_limit filtering.

To unsubscribe from a room, the client needs to send a request with the room ID to unsubscribe from in the unsubscribe_rooms array:

{
  "unsubscribe_rooms": [ "!sub1:bar" ]
}

This will delete that key from the room_subscriptions map on the server. It is common to for clients to view one room then swap to another room. This can be modelled as a subscription on the new room coupled with unsubscribing from the old room. For example, if the client swapped from viewing !sub1:bar to !sub2:bar:

{
  "room_subscriptions": {
      "!sub2:bar": {
          "required_state": [ ["*","*"] ],
          "timeline_limit": 50
      }
  },
  "unsubscribe_rooms": [ "!sub1:bar" ]
}

unsubscribe_rooms is cleared after every response; it is not sticky.

Rationale: By using a map, this supports clients who can show multiple room timelines in the UI e.g Hydrogen's grid view. The unsubscribe_rooms array allows rooms to be efficiently deleted from the map. An alternative would be to specify an empty JSON object in the room subscription but that feels less explicit than the array form.

Commonalities between the Room Subscription API and Sliding Window API

In the request, both the sliding window API and the room subscription API use the same keys to extract room data. Both APIs also return that room data in the same part of the response. These keys are:

  • required_state: Required state for each room returned. An array of event type and state key tuples.
  • timeline_limit: The maximum number of timeline events to return per response.
  • include_old_rooms: Determines if predecessor rooms are included in the rooms response.

All room data is returned in a top-level rooms keys in the response JSON, regardless of whether this room is being returned due to it being a room subscription or in a list. This de-duplicates data when a room can be present in more than 1 list. However, multiple lists may have different values for required_state or timeline_limit. In this case, these values are combined together according to the following rules:

  • required_state: Combine all arrays and treat it as a single unified array.
  • timeline_limit: Take the highest value.
  • include_old_rooms: Presence of this field in any section turns this on. If there are multiple matches for the same room ID (e.g explicit subscription and present in a list) then the inner values of required_state and timeline_limit are unioned in the same way.

Due to this, clients need to take care to extract only the number of timeline events / state events they require from the rooms response, as it may include more data than they requested in a single list.

Tombstones

By default, sliding sync will not return "old" rooms in lists. This is generally the right thing to do, as many popular rooms have previous versions which would otherwise feature in the room list. This section details the semantics for how sliding sync does this, and how to opt-out of this behaviour.

There is no is_tombstoned filter in sliding sync. This is by design, as it is almost always not what clients want. With a simple is_tombstoned filter, the moment another user upgrades a room, the room will disappear from the room list for all other users. Not all tombstoned rooms are equal. If the user has joined the replacement_room, then the previous room is treated as "old". If the user has not joined the replacement_room, then the room is treated as live, and is eligible to be returned in sliding sync responses.

If the include_old_rooms field is set, the rooms field in the response may contain additional rooms. These rooms are "old" rooms for every matched room for a particular list or a particular room subscription, depending where include_old_rooms was set in the request. The user MUST be joined to old rooms for them to show up in the response.

TODO: we rely on include_old_rooms being set to "enable" this, but we mux together request based on nil-ness so it's not possible to disable include_old_rooms by omitting it.

For example, given a list of joined rooms A, B, C, A2, A3 where A2 and A3 are newer versions of room A, sliding sync will not return rooms A or A2 by default. The client may send the following direct room subscription to include these rooms:

{
  "room_subscriptions": {
      "A3": {
          "required_state": [ ["*","*"] ],
          "timeline_limit": 50,
          "include_old_rooms": {
            "timeline_limit": 1,
            "required_state": [ ["m.room.create", ""] ]
          }
      }
  }
}

This will result in a rooms response for A, A2 and A3, where A and A2 use the timeline_limit: 1 and required_state: [ ["m.room.create", ""] ] values, and A3 uses timeline_limit: 50 and required_state: [ ["*","*"] ]. If a client explicitly subscribes to an old room, say A2, then include_old_rooms works backwards from that point, including A but not the newer room A3.

These options work on lists as well:

{
  "lists": {
    "a": {
      "include_old_rooms": {
        "timeline_limit": 1
      },
      "timeline_limit": 50,
      "filters": {
        "is_encrypted": true
      }
    }
  }
}

When applied to lists, old rooms MUST NOT be present in the list. They MUST be present in the rooms response only. The old rooms DO NOT need to meet the filter criteria. That is to say, if A was unencrypted and A2 and A3 were encrypted, this list would include only A3 (as old rooms must not be present), and would have a rooms response for A, A2 and A3: room A is included even though it is unencrypted, because "oldness" takes precedence. Conversely, if the filter was is_encrypted: false, then no rooms would be returned even though room A is joined and unencrypted, because it is old and hence ineligible for being returned in a list.

Lazy-Loading Room Members

Room members in a room can be lazily-loaded by requesting the special value $LAZY as the state key for the m.room.member event type in the required_state filter:

{
  "required_state": [
    ["m.room.member", "$LAZY"] // activate lazy loading
  ]
}

At a high level, this can be thought of as requesting the m.room.member events for a set of unknown user IDs. Typically, when you view a room, you want to retrieve all state events except for m.room.member events which you want to lazily load. To get this behaviour, clients can send the following:

{
  "required_state": [
    ["m.room.member", "$LAZY"], // activate lazy loading
    ["*", "*"] // request all state events _except_ for m.room.member events which are lazily loaded
  ]
}

Check the description of required_state for more information on this behaviour, as it is not specific to lazy-loading.

The server processes $LAZY according to the following rules:

  • Calculate the timeline entries that will be returned in this room.
  • For each timeline entry, ensure the m.room.member event for the sender of the timeline event is included exactly once per user ID.
    • This means if timeline_limit: 0 then no m.room.member events are returned.
  • The required state is always the current state, so if the timeline had [Alice join, msg, msg, Alice leave] then the leave m.room.member event should be returned in required_state, even though the state at the time of the messages was the join event.
    • This means that for a timeline like [Alice join, msg, msg, Alice change name to A] the m.room.member event will contain the display name "A" even though the display name was "Alice" at the time the messages were sent. Clients need to look at the unsigned.prev_content section of the "A" event to work out what the display name was at the time the messages were sent (rolling back state). Clients MUST NOT rely on seeing the correct "state before the event" value in required_state.
  • When the client is live streaming events, include the m.room.member event for the live events only if they have not been sent before during this connection. This means servers must remember which user IDs it has sent m.room.member events for, for the lifetime of the connection. If the live event is an m.room.member event itself, include it in both the timeline and required_state to avoid clients needing to parse the timeline for current state. This is particularly important as the client CANNOT relilably work out the current state from the timeline entries in the face of state resolution.

Note: It is strongly advised to not lazy-load members in encrypted rooms, as the client needs a complete room member list in order to determine which devices to encrypt messages for. It is possible to use lazy-loading members in conjunction with the /members endpoint to extract the complete list of joined users, but this is only really useful if this is done at the point of sending a message, as if you do it when you view a room you might as well just request the complete member list via Sliding Sync. If you wait until the client sends a message to query /members, it will take longer to send the message as the client will need to retrieve device information for all the users before it can send the event. If instead the client retrieved this information when the room is initially viewed, the client has more time to pre-emptively fetch this information to result in a snappier UX. Be careful if using /members as clients won't be able to use ?at= to avoid race conditions because sliding sync streaming tokens are not compatible with other endpoints.

Notifications API

If you are tracking the top 5 rooms and an event arrives in the 6th room, you will be notified about the event ONLY IF the sort order means the room bumps into the top 5. If for example you sorted by_name then you won't be notified about the event in the 6th room, unless it's an m.room.name event which moves the room into the top 5. In most "recent" sort orders a new event will result in the 6th room bumping to the top of the list. A notable exception is when the rooms are sorted in alphabetical order (by_name), which is what some other chat clients do for example. In this case, you don't care about the event unless the event is a "highlightable" event (e.g direct @mention). The notifications API exists to provide a mechanism for clients to display "unread messages" indicators on the room list at positions not currently inside a sliding window.

TODO:

  • Unsure how much data to expose (probably index position + notif/highlight counts?). If we do counts then we are doomed to send a response to a client every time an event is sent in a noisy room, which seems rather wasteful. Perhaps make it configurable? @timokoesters mentions having approx counts to avoid the churn e.g only two digits of precision (21 -> 21, but 1234 -> 1200), this fits UIs very nicely.
  • This has not been fully specced yet because in practice most clients sort by recency so it's not urgent to include this. For clients who sort by name though, this is a show stopper.

Bandwidth optimisations for persistent clients

The Sliding Sync API assumes that room data is deleted on the client when:

  • the room falls out of the sliding window or;
  • a window gets invalidated or;
  • the connection expires and a new connection is created.

The API will send the entire required_state and timeline again when the room re-appears for the 2nd time. This is wasteful if the client remembers the state/timeline and there have been no changes.

To resolve this, the API exposes an opt-in mechanism for providing efficient delta updates. This is encoded into a "delta token" which is an opaque string. If a request is missing a delta token, no bandwidth optimisations are applied. This token sits at the top-level of the request/response JSON as delta_token. If a delta token is provided, the server SHOULD remove events that have already been sent and acknowledged by the client. The list of fields which this can apply to is not fully determined, but SHOULD include:

  • required_state events
  • timeline events
  • Any extensions which return events e.g account_data. Extensions which make use of the delta token MUST state so in its MSC.

The delta token sits outside the scope of connections, and hence can be used to remember data between connections. The token remembers the following information for every room which has been sent to the client:

  • The event ID of the last sent timeline event.
  • The event ID of the last sent required_state event, keyed off (type, state_key).

When the client makes a new connection, or when a room re-appears inside a window, the following algorithm is applied:

  • Work out the required_state for this room as if there was no delta token.
  • Filter required_state by looking for event ID matches referenced by the delta token. If there is a match, remove that state event.
  • Work out the timeline for this room as if there was no delta token.
  • Attempt to find the last sent timeline event referenced by the delta token. If it is found, discard all events before this event, including the referenced event itself. The remaining timeline events are sent to the client.

This can create gaps in the timeline, but this could already happen between connections for persistent clients. It is up to the client to resolve gaps by querying /messages. The prev_batch token MUST be updated if events are filtered out.

A worked example:

  • Client hits /sync and accumulates some data for rooms. They are using a delta token.
  • The client goes offline for a while, and room data changes. The client's connection expires.
  • The client reappears and hits /sync to start a new connection. The delta token takes effect and returns a much smaller delta.
         Client                            Server
           |------------/sync---------------->|
           |       timeline_limit=4,          |
           |  required_state=[PL,avatar]      |    Generate response, store delta_token=X
           |                                  |    Room1, last_timeline_event=$D, m.room.power_levels=$B, m.room.avatar=$C
           |<-----------/sync-----------------|
           |  Room1,timeline[$A,$B,$C,$D]     |
           | req_state=[$B,$C],delta_token=X  |
           |                                  |
                  ... time passes ...             Room1 new events $E,$F, m.room.avatar updates to $F
           |                                  |
           |------------/sync---------------->|
           |  delta_token=X,timeline_limit=4  |  Generate response:      timeline=[$C,$D,$E,$F], PL=$B, avatar=$F
           |  required_state=[PL,avatar]      |  Compare with delta_token: last_timeline=$D         PL=$B, avatar=$C
           |                                  |  Diff:                         timeline=[$E,$F]         avatar=$F
           |<-----------/sync-----------------|
           |    Room1,timeline[$E,$F]         |
           |  req_state=[$F],delta_token=Y    |

Limitations

The delta token will not work under the following scenarios:

  • The timeline is filtered in some way. Currently Sliding Sync provides no filtering mechanism for timeline events but it will in the future. Any filters need to be the same between connections for the bandwidth optimisations to work at all.
  • m.room.member events are excluded from these calculations. Delta tokens map to a lot of data server-side. In an effort to bound the growth of this data, m.room.member events MAY be sent redundantly even if the client has been sent it before. This also reduces the chances of missing an m.room.member event, which would risk causing E2EE key issues as the client would fail to encrypt for the target room member.
  • The server does not need to remember delta tokens and the associated data forever. The server can expire this data whenever they want, which will result in more redundant information being sent to the client and a new delta token being generated. This MSC recommends that servers keep delta tokens valid for at least 7 days.

E2EE Handling

The server cannot calculate the highlight_count in E2EE rooms as it cannot read the message content. This is a problem when clients want to sort by the most recent highlight. In comparison, the server can calculate the name, unread_count, and work out the most recent timestamp when sorting by those fields. What should the server do when the client wants to sort by the most recent highlight (which is pretty typical!)? It can:

  • Assume highlight_count == 1 whenever unread_count > 0. This ensures that E2EE rooms are always bumped above unreads in the list, but doesn't allow sorting within the list of highlighted rooms.
  • Assume highlight_count == 0 always. This will always sort E2EE rooms below the highlight list, even if the E2EE room has a @mention.
  • Sort E2EE rooms in their own dedicated list: {"filters": { "is_encrypted": true }}

In all cases, the client needs to do additional work to calculate the highlight_count. When the client is streaming this work is very small as it just concerns a single event. However, when the client has been offline for a while there could be hundreds or thousands of missed events. There are 3 options here:

  • Do no work and immediately red-highlight the room. Risk of false positives.
  • Grab the last N messages and see if any of them are highlights. Current implementations using sync v2 do this.
  • Grab all the missed messages and see if any of them are highlights. Denial of service risk if there are thousands of messages.

Once the highlight count has been adequately estimated (it's only truly calculated if you grab all messages), this may affect the sort order for this room - it may diverge from that of the server. More specifically, it may bump the room up or down the list, depending on what the sort implementation is for E2EE rooms (top of list or below rooms with highlights).

Clients have two main choices here:

  • Lite: Keep E2EE rooms in the main list. This means the sort order won't always be strictly accurate for them but is fast to do. If you are sorting by highlight count then unread count (which is fairly typical) then E2EE rooms will always be bumped above all the unread count rooms if the resolution algorithm is set to "Assume highlight_count == 1 whenever unread_count > 0".
  • Heavy: Sort E2EE rooms into a separate list. Manually mix together the E2EE list and the main list depending on highlight counts. This means the sort order will be more accurate but is slower and more complex to perform. This is why there is an is_encrypted filter on the room list parameters.

If you use the sort options ["by_notification_level", "by_recency"], this will implement the "Lite" option for you automatically. This create the following groups (in priority order):

  • Unencrypted rooms with highlight_count > 0 appear first. (NB: you cannot get encrypted rooms with highlight_count > 0)
  • Encrypted rooms with notification_count > 0 appear next.
  • Unencrypted rooms with notification_count > 0 follow.
  • Rooms with highlight_count == 0 && notification_count == 0 appear last.

Within each group, the rooms are then sorted by recency (most recent first). This has the follow negative side-effects:

  • An explicit @mention in an encrypted room will not bump the room to the top of the list if and only if there are highlight counts for unencrypted rooms. It will instead bump the room to the bottom of the last unencrypted room with a highlight count.
  • A newer unread notification for an unencrypted room will sort beneath older unread notifications for encrypted rooms.

If these trade-offs are unacceptable to a client implementation then they will need to sort encrypted rooms into their own list and manually mix rooms from each list together as per the "Heavy" description.

In the future, it may become impossible for servers to sort by room name due to E2EE. This proposal has no suggestion on how to handle encrypted room names beyond hoping that homomorphic encryption will allow sorting based on ciphertext: this is an active area of research in the computer science field.

Extensions

We anticipate that as more features land in Matrix, different kinds of data will also want to be synced to clients. Sync v2 did not have any first-class support to opt-in to new data. Sliding Sync does have support for this via "extensions". Extensions also allow this proposal to be broken up into more manageable sections. Extensions are requested by the client in a dedicated extensions block:

{
    "extensions": {
        "name_of_extension": { // sticky
            "enabled": true, // sticky
            "lists": ["rooms", "dms"], // sticky
            "rooms": ["!abcd:example.com"], // sticky
            "extension_arg": "value", // stickiness specified by the extension
            "extension_arg_2": true   // stickiness specified by the extension
        }
    }
}

Extensions MUST have an enabled flag which defaults to false. If a client sends an unknown extension name, the server MUST ignore it (or else backwards compatibility between clients and servers is broken when a newer client tries to communicate with an older server). Extension args may or may not be sticky, it depends on the extension.

Extensions can leverage the data from the core API, notably which rooms are currently inside sliding windows as well as which rooms are explicitly subscribed to. By default, an extension is expected to be aware of and act on all sliding windows and all room subscriptions. However, this may mean the extension provides data that the client never uses. (For example, clients may be interested in seeing typing notifications for rooms in a sliding window, but ignore such notifications in a background list of all rooms.)

To avoid transferring useless data, the spec reserves a field lists, which is a sticky list of strings, namely the names of lists given to the Sliding Window API. There are four behaviours that the client can request of the extension:

{"lists": []}                    // Do not process any lists.
{"lists": ["rooms", "dms"]}      // Process only a subset of lists.
{"lists": ["*"]}                 // Process all lists defined in the Sliding Window API. (This is the default.)
{"lists": ["*", "junk", "here"]} // The same: anything whose first entry is `*` means "process all lists".
{"lists": null}                  // No change, use the `lists` sticky value from previous requests.
{} // field omitted              // The same: use the previous sticky value.

Similarly, we reserve a rooms field, which is a sticky list of room IDs given to the Room Subscription API. Again, there are four behaviours:

{"rooms": []}                    // Do not process any specific rooms.
{"rooms": ["!a:b", "!c:d"]}      // Process only a subset of room subscriptions.
{"rooms": ["*"]}                 // Process all room subscriptions defined in the Room Subscription API. (This is the default.)
{"rooms": ["*", "junk", "here"]} // The same: anything whose first entry is `*` means "process all room subscriptions".
{"rooms": null}                  // No change, use the `rooms` sticky value from previous requests.
{} // field omitted              // The same: use the previous sticky value.
Examples of using lists and rooms
{
   "enabled": false, // extension completely disabled
}
{
   "enabled": true,  // extension enabled for all sliding windows and all room subscriptions
   "lists": ["*"],
   "rooms": ["*"],
}
{
   "enabled": true,   // extension enabled for all room subscriptions,
   "lists": [],       // but not enabled for sliding windows
   "rooms": ["*"],
}
{
   "enabled": true,   // extension enabled for all room subscriptions,
   "lists": ["dms"],  // and for the "dms" sliding window
   "rooms": ["*"],
}
{
   "enabled": true,   // extension enabled for all sliding windows and one specific room
   "lists": ["*"],
   "rooms": ["!myroom:example.com"],
}
{
   "enabled": true,   // extension enabled for the "dms" sliding window and one specific room
   "lists": ["dms"],  // and for the "dms" sliding window
   "rooms": ["!myroom:example.com"],
}
{
   "enabled": true,   // extension enabled for the "dms" sliding window and one specific room
   "lists": ["dms"],  // and for the "dms" sliding window
   "rooms": ["!myroom:example.com"],
}
{
   "lists": ["dms"]   // use "enabled and "rooms" from the previous request,
   // and only enable the extension for the "dms" sliding window.
}

The lists and rooms keys are independent and can be freely mixed (as in the core sliding sync API.) It's possible that the same room appears in multiple sliding windows, or in both a sliding window and an explicit room subscription. In this case, the extension should process that room if the extension is configured to process any of the windows/subscriptions that contains the room. (The logic is a union of conditions, not an intersection.)

Extensions SHOULD NOT attach their own semantics to the lists and rooms fields. Extensions are otherwise free to define and process their own config fields, which may be sticky. Such fields are ignored by the Core of sliding sync and transparently forwarded to extensions.

In an effort to reduce the size of this proposal, extensions will be done in separate MSCs. There will be extensions for:

  • To Device Messaging - MSC3885
  • End-to-End Encryption - MSC3884
  • Typing Notifications - MSC3961
  • Receipts - MSC3960
  • Presence - presence in sync v2: spec
  • Account Data - account_data in sync v2: MSC3959

Rationale: The name 'extensions' is inspired by the spec itself which refers to "Extensions to /sync" multiple times. These additional bits of data are all generally outside the scope of the core room graph and room list so are well-placed for being treated separately. Furthermore, it is possible to make a meaningful client which only supports the core API and no extensions, as the core controls the room list and ability to receive events and state in a room. For clients which don't do E2EE and don't handle presence/typing/receipts/other metadata, they can simply work with this MSC alone and in full. This is a good balance because it means this MSC alone is useful: it doesn't require additional extensions in order for a basic Matrix client to be written.

Filter and Sort Extensions

In addition to extending the sync API by adding more data to the response, the sync API needs to include additional sorting/filtering options. Clients may want to sort or filter the room list in more ways than this MSC provides (e.g include historical rooms, include knocked rooms) in order to provide a good UI/UX. This is officially supported in the following way:

  • Sorting: Define a sort string (namespaced by MSC number when in the MSC process) and define exactly how a comparator function should be defined (less, equal, greater than). Explain the room-specific data that is being operated on. This sort string can then appear in the sort array.
  • Filtering: Define a JSON object which represents the arguments for the filter. If there is only a single argument then the JSON object may be a JSON value e.g true or "room search query". Define a filter key name (namespaced by MSC number when in the MSC process). This filter can then appear in the filter object.

Caveats: It is not possible to specify ascending/descending when specifying a sort option. Furthermore, it is not possible to include AND/OR/NOT operators in filter operations (they are always AND'd). This is by design at present in order to restrain the scope and complexity of this MSC. Introducing options for these will scope creep this MSC into creating an entire query language like SQL or GraphQL. The author wishes to see exactly what sorting/filtering extension MSCs are created in order to see if expanding the scope of the core MSC to include these options is sensible or not. Furthermore, it's not currently defined how servers should behave if they encounter a filter or sort operation they do not recognise. If the server rejects the request with an HTTP 400 then that will break backwards compatibility with new clients vs old servers. However, the client would be otherwise unaware that only some of the sort/filter operations have taken effect. We may need to include a "warnings" section to indicate which sort/filter operations are unrecognised, allowing for some form of graceful degradation of service.

Potential issues

This is a very large change to the Client-Server API, which affects the core data flows for every single client implementation. This means it will require a lot of work from client developers to support this MSC, especially given in practice clients will need to support both sliding sync and /sync. This work will slow down adoption of sliding sync.

In addition, this API is more restrictive than sync v2 as not all data is returned to the client. It is possible that some data flows which are possible in sync v2 will not be possible in sliding sync due to sorting and filtering limitations such as but not limited to:

  • More complex sorting operations beyond recency/name/unread counts e.g by number of joined members.
  • More complex filtering operations such as showing DMs from users who are currently in the viewed space, dependent on some flag in user settings.
  • More complex display operations such as showing summed total notification counts in spaces.
  • More complex space operations such as handling orphaned rooms and traversal of subspaces.
  • More complex bot requirements like knowing all rooms which have a certain custom state event in it, such that the presence of a state event becomes a filter.

It is expected that some of these use cases will be supported as this MSC is iterated upon. However, it is likely that some of these use cases will not be supported in this MSC, but may be supported via use of an extension MSC where applicable. Unfortunately, there may be some data flows which are genuinely impossible to perform due to limitations of server-side operations (e.g if the data is encrypted). In this case, clients will be forced to pull in all E2EE rooms to perform their data flows, which, whilst slow, should still perform better than sync v2.

This MSC alone won't meet the needs of the entire ecosystem in terms of sorting/filtering/data returned to the client. Extensions are a crucial part of this MSC to clearly define how the sync API can expand with changing requirements.

Alternatives

There are two main alternatives to this proposal:

  • Do nothing and keep using sync v2 in its current shape. Attempt to make it run faster.
  • Factor out some obviously expensive bits from sync v2 (e.g receipts) but keep returning all rooms in the response i.e no paginated sync.

Both alternatives will still scale based on the number of joined rooms on the user's account. Effective implementations may delay long sync times but fundamentally won't prevent long sync times, given a sufficiently large account. The core assumption of this MSC is that user accounts will have 1000s and 10,000s of rooms per account as metadata rooms continue to be added (VoIP conference rooms, spaces, profiles-as-rooms, thread-per-room, etc). If this assumption is false and room counts remain reasonably well bounded then this MSC may not be required.

GraphQL

We could define schemas for querying sync data and expose a GraphQL server on every Matrix homeserver. This would have numerous benefits:

  • flexible query language,
  • SDKs exist which interact with GraphQL e.g for automatically handling pagination, streaming,
  • more standardised than a custom line protocol i.e if you know GraphQL already, it lowers the barrier to entry (e.g using Subscriptions for real-time updates)
  • some would argue this is less complex than designing a custom API.

This would have the following drawbacks:

  • easy to design slow performing queries which work well for small accounts but degrade on large accounts,
  • Denial of Service risk, mitigated via strong rate limits (see Github v4 API),
  • higher bandwidth costs than a custom API (both for requests and responses),
  • easier to accidentally expose confidential information by not applying sufficient authentication checks,
  • some would argue this is more complex than designing a custom API,
  • it forces all Matrix developers to become familiar with GraphQL as the queries are crafted client-side,
  • it's difficult to cache responses, impacting speed.

Overall, GraphQL would be suitable for rapid prototyping, but does not meet the Goals of this API.

Security considerations

This API presents new ways to request data from the server which need appropriate authentication checks:

  • Room subscriptions: ensure the user is joined to the room ID in question.
  • Spaces filters: ensure the user is joined to the space room ID in question.
  • Timeline limits: ensure the user is allowed to see events as far back as they request (history visibility).

In addition, this API presents new ways for the server to filter/sort Matrix data, which may become impossible if they are end-to-end encrypted:

  • Room names, user display names, canonical aliases. These events are used to calculate room display names.
  • State event types and state keys. These are used in required_state filters and if they are encrypted it won't be possible for servers to return those specific events.
  • Highlight and notification counts imply the ability to inspect the event on the server. This is not possible in E2EE rooms. This is covered somewhat in the "E2EE handling" section of this MSC. If a client decides to work out accurate counts for E2EE rooms then they must fetch all missed events in the room and decrypt them to work out the content. If there are 1000s of missed events this will cause a denial of service attack on this client as downloading and decrypting all the events are expensive operations.

Furthermore, this API presents new ways for malicious users to modify other clients:

  • Specifying bogus origin_server_ts values on events will cause those rooms to be moved appropriately when the sort operation is by_recency. A malicious user could hide a room by forcibly sending events with a low origin_server_ts value. Conversely, they could force a room to be always near the top of the list by forcibly sending events with a high origin_server_ts. Servers could mitigate this by bounding the origin_server_ts used to be +/- 5min of their own clock, whilst still sending the real origin_server_ts value in the event.
  • Subtly adjusting the events in the room could adjust the calculated room name to be inappropriate. For example, if the malicious user can engineer which hero's display names are used when calculating the room name (say by joining/leaving fake accounts) then it's possible for those names to advertise spam or spell out offensive words.

This API presents new ways for clients to request complex operations which runs the risk of denial of service attacks:

  • Complex or pathological filter/sort options (especially via extensions) may degrade performance on the server and client. This may affect other users on the server.
  • Excessively long lists, list keys, ranges, etc. Some limits are specified in this MSC to mitigate against this.
  • Excessive amounts of concurrent connections could consume large amounts of memory on the server for a single device. It is recommended that servers limit the number of concurrent connections to 5, and expire the oldest connection first.

Unstable prefix

Whilst this in MSC review the HTTP path will be /_matrix/client/unstable/org.matrix.msc3575/sync with the intention of this eventually becoming (confusingly) /_matrix/client/v4/sync. As this is a brand new endpoint, no other keys or fields need prefixing.

Homeservers can advertise support for a sliding sync proxy by adding the following to their /.well-known/matrix/client config:

{
    "org.matrix.msc3575.proxy": {
        "url": "https://slidingsync.proxy.url.here"
    }
}

This allows servers to declare an "official" trusted proxy, rather than using other URLs which may be run by malicious actors who want to steal the access token for users.

Dependencies

There are no MSCs required for the core functionality to be implemented. Servers and clients need to be spaces-aware for spaces filters. Extension MSCs will depend on this MSC for their core functionality.

Implementation state

Proxy server (v0.99.0):

  • Sliding Window API:
    • Operation support
    • Required state with wildcards
    • Timeline limits
    • Calculated room names
    • Highlight/notification counts
    • Joined and invited member counts
    • Prev batch (token will cause duplicate events on /messages)
    • include_old_rooms
    • Sorting:
      • By recency
      • By highlight count
      • By notification count
      • By name (no locale flag)
    • Filtering:
      • is_dm
      • is_encrypted
      • is_invite
      • spaces
      • room_types and not_room_types
      • room_name_like
      • tags and not_tags
  • Room Subscription API
  • Notifications API (unspecced)
  • Bandwidth optimisations
  • E2EE highlight/notification count handling
  • Extensions:

Appendices

In order to aid implementations, a series of test cases are provided which demonstrate core functionality of this MSC. The intention of these test cases is to provide a way to automatically verify compliance with this MSC. As such, they are represented as a sequence of JSON objects. These test cases are not exhaustive, and don't account for authentication via access tokens or handling multiple user accounts. For brevity, only fields that concern sliding sync are included in event descriptions.

TODO