|
|
|
|
@ -80,26 +80,63 @@ following _additional_ properties[^prop]:
|
|
|
|
|
|
|
|
|
|
To implement these properties, servers MUST:
|
|
|
|
|
|
|
|
|
|
* Attempt to send all sticky events to all joined servers, whilst respecting per-server backoff times.
|
|
|
|
|
* Attempt to send their own[^origin] sticky events to all joined servers, whilst respecting per-server backoff times.
|
|
|
|
|
Large volumes of events to send MUST NOT cause the sticky event to be dropped from the send queue on the server.
|
|
|
|
|
* Ensure all sticky events are delivered to clients via `/sync` in a new section of the sync response,
|
|
|
|
|
regardless of whether the sticky event falls within the timeline limit of the request.
|
|
|
|
|
* When a new server joins the room, existing servers MUST attempt delivery of all sticky events _originating from their server only_[^newjoiner].
|
|
|
|
|
* When a new server joins the room, existing servers MUST attempt delivery of all of their own sticky events[^newjoiner].
|
|
|
|
|
* Remember sticky events per-user, per-room such that the soft-failure checks can be re-evaluated.
|
|
|
|
|
|
|
|
|
|
When an event loses its stickiness, these properties disappear with the stickiness. Servers SHOULD NOT
|
|
|
|
|
eagerly synchronise such events anymore, nor send them down `/sync`, nor re-evaluate their soft-failure status.
|
|
|
|
|
Note: policy servers and other similar antispam techniques still apply to these events.
|
|
|
|
|
|
|
|
|
|
Servers SHOULD rate limit sticky events over federation. If the rate limit kicks in, servers MUST
|
|
|
|
|
return a non-2xx status code from `/send` such that the sending server *retries the request* in order
|
|
|
|
|
to guarantee that the sticky event is eventually delivered. Servers MUST NOT silently drop sticky events
|
|
|
|
|
and return 200 OK from `/send`, as this breaks the eventual delivery guarantee.
|
|
|
|
|
|
|
|
|
|
These messages may be combined with [MSC4140: Delayed Events](https://github.com/matrix-org/matrix-spec-proposals/pull/4140)
|
|
|
|
|
to provide heartbeat semantics (e.g required for MatrixRTC). Note that the sticky duration in this proposal
|
|
|
|
|
is distinct from that of delayed events. The purpose of the sticky duration in this proposal is to ensure sticky events are cleaned up.
|
|
|
|
|
|
|
|
|
|
### Rate limits
|
|
|
|
|
|
|
|
|
|
As sticky events are sent to clients regardless of the timeline limit, care needs to be taken to ensure
|
|
|
|
|
that other room participants cannot send large volumes of sticky events.
|
|
|
|
|
|
|
|
|
|
Servers SHOULD rate limit sticky events over federation. Servers can choose one of two options to do this:
|
|
|
|
|
- A) Do not persist the sticky events and expect the other server to retry later.
|
|
|
|
|
- B) Persist the sticky events but wait a while before delivering them to clients.
|
|
|
|
|
|
|
|
|
|
Option A means servers don't need to store sticky events in their database, protecting disk usage at the cost of more bandwidth.
|
|
|
|
|
To implement this, servers MUST return a non-2xx status code from `/send` such that the sending server
|
|
|
|
|
*retries the request* in order to guarantee that the sticky event is eventually delivered. Servers MUST NOT
|
|
|
|
|
silently drop sticky events and return 200 OK from `/send`, as this breaks the eventual delivery guarantee.
|
|
|
|
|
Care must be taken with this approach as all the PDUs in the transaction will be retried, even ones for different rooms / not sticky events.
|
|
|
|
|
|
|
|
|
|
Option B means servers have to store the sticky event in their database, protecting bandwidth at the cost of more disk usage.
|
|
|
|
|
This provides fine-grained control over when to deliver the sticky events to clients as the server doesn't need
|
|
|
|
|
to wait for another request. Servers SHOULD deliver the event to clients before the sticky event expires. This may not
|
|
|
|
|
always be possible if the remaining time is very short.
|
|
|
|
|
|
|
|
|
|
### Federation behaviour
|
|
|
|
|
|
|
|
|
|
Servers are only responsible for sending sticky events originating from their own server. This ensures the server is aware
|
|
|
|
|
of the `prev_events` of all sticky events they send to other servers. This is important because the receiving server will
|
|
|
|
|
attempt to fetch those previous events if they are unaware of them, _rejecting the transaction_ if the sending server fails
|
|
|
|
|
to provide them. For this reason, it is not possible for servers to reliably deliver _other server's_ sticky events.
|
|
|
|
|
|
|
|
|
|
In the common case, sticky events are sent over federation like any other event and do not cause any behavioural changes.
|
|
|
|
|
The two cases where this is different is:
|
|
|
|
|
- when sending sticky events to newly joined servers
|
|
|
|
|
- when sending "old" but unexpired sticky events
|
|
|
|
|
|
|
|
|
|
Servers tend to maintain a sliding window of events to deliver to other servers e.g the most recent 50 PDUs. Sticky events
|
|
|
|
|
can fall outside this range, which is what we define as "old". On the receiving server, old events appear to have unknown
|
|
|
|
|
`prev_events`, which cannot be connected to any known part of the room DAG. Sending sticky events to newly joined servers can be seen
|
|
|
|
|
as a form of sending old but unexpired sticky events, and so this proposal only considers this case. Sending these old events
|
|
|
|
|
will potentially increase the number of forward extremities in the room for the receiving server. This may impact state resolution
|
|
|
|
|
performance if there are many forward extremities. Servers MAY send dummy events to remove forward extremities (Synapse has the
|
|
|
|
|
option to do this since 2019). Alternatively, servers MAY choose not to add old sticky events to their forward extremities, but
|
|
|
|
|
this A) reduces eventual delivery guarantees by reducing the frequency of transitive delivery of events, B) reduces the convergence
|
|
|
|
|
rate when implementing ephemeral maps (see "Implementing an ephemeral map"), as that relies on servers referencing sticky events from other servers.
|
|
|
|
|
|
|
|
|
|
### Sync API changes
|
|
|
|
|
|
|
|
|
|
The new `/sync` section looks like:
|
|
|
|
|
@ -417,6 +454,7 @@ This becomes particularly important when room state is rolled back. For example,
|
|
|
|
|
then Bob kicks Charlie, but concurrently Alice kicks Bob then whether or not a receiving server would accept E would depend
|
|
|
|
|
on whether they saw “Alice kicks Bob” or “Bob kicks Charlie”. If they saw “Alice kicks Bob” then E would be accepted. If they
|
|
|
|
|
saw “Bob kicks Charlie” then E would be rejected, and would need to be rolled back when they see “Alice kicks Bob”.
|
|
|
|
|
[^origin]: That is, the domain of the sender of the sticky event is the sending server.
|
|
|
|
|
[^newjoiner]: We restrict delivery of sticky events to ones sent locally to reduce the number of events sent on join. If
|
|
|
|
|
we sent all active sticky events then the number of received events by the new joiner would be `O(nm)` where `n` = number of joined servers,
|
|
|
|
|
`m` = number of active sticky events.
|
|
|
|
|
|