From 81cf7282ffccbe3b366155ce1ec98684292a58f5 Mon Sep 17 00:00:00 2001 From: Kegan Dougal <7190048+kegsay@users.noreply.github.com> Date: Thu, 25 Sep 2025 09:36:16 +0100 Subject: [PATCH] Update 4354-sticky-events.md --- proposals/4354-sticky-events.md | 52 ++++++++++++++++++++++++++++----- 1 file changed, 45 insertions(+), 7 deletions(-) diff --git a/proposals/4354-sticky-events.md b/proposals/4354-sticky-events.md index 62c169720..19a293d7f 100644 --- a/proposals/4354-sticky-events.md +++ b/proposals/4354-sticky-events.md @@ -80,26 +80,63 @@ following _additional_ properties[^prop]: To implement these properties, servers MUST: -* Attempt to send all sticky events to all joined servers, whilst respecting per-server backoff times. +* Attempt to send their own[^origin] sticky events to all joined servers, whilst respecting per-server backoff times. Large volumes of events to send MUST NOT cause the sticky event to be dropped from the send queue on the server. * Ensure all sticky events are delivered to clients via `/sync` in a new section of the sync response, regardless of whether the sticky event falls within the timeline limit of the request. -* When a new server joins the room, existing servers MUST attempt delivery of all sticky events _originating from their server only_[^newjoiner]. +* When a new server joins the room, existing servers MUST attempt delivery of all of their own sticky events[^newjoiner]. * Remember sticky events per-user, per-room such that the soft-failure checks can be re-evaluated. When an event loses its stickiness, these properties disappear with the stickiness. Servers SHOULD NOT eagerly synchronise such events anymore, nor send them down `/sync`, nor re-evaluate their soft-failure status. Note: policy servers and other similar antispam techniques still apply to these events. -Servers SHOULD rate limit sticky events over federation. If the rate limit kicks in, servers MUST -return a non-2xx status code from `/send` such that the sending server *retries the request* in order -to guarantee that the sticky event is eventually delivered. Servers MUST NOT silently drop sticky events -and return 200 OK from `/send`, as this breaks the eventual delivery guarantee. - These messages may be combined with [MSC4140: Delayed Events](https://github.com/matrix-org/matrix-spec-proposals/pull/4140) to provide heartbeat semantics (e.g required for MatrixRTC). Note that the sticky duration in this proposal is distinct from that of delayed events. The purpose of the sticky duration in this proposal is to ensure sticky events are cleaned up. +### Rate limits + +As sticky events are sent to clients regardless of the timeline limit, care needs to be taken to ensure +that other room participants cannot send large volumes of sticky events. + +Servers SHOULD rate limit sticky events over federation. Servers can choose one of two options to do this: + - A) Do not persist the sticky events and expect the other server to retry later. + - B) Persist the sticky events but wait a while before delivering them to clients. + +Option A means servers don't need to store sticky events in their database, protecting disk usage at the cost of more bandwidth. +To implement this, servers MUST return a non-2xx status code from `/send` such that the sending server +*retries the request* in order to guarantee that the sticky event is eventually delivered. Servers MUST NOT +silently drop sticky events and return 200 OK from `/send`, as this breaks the eventual delivery guarantee. +Care must be taken with this approach as all the PDUs in the transaction will be retried, even ones for different rooms / not sticky events. + +Option B means servers have to store the sticky event in their database, protecting bandwidth at the cost of more disk usage. +This provides fine-grained control over when to deliver the sticky events to clients as the server doesn't need +to wait for another request. Servers SHOULD deliver the event to clients before the sticky event expires. This may not +always be possible if the remaining time is very short. + +### Federation behaviour + +Servers are only responsible for sending sticky events originating from their own server. This ensures the server is aware +of the `prev_events` of all sticky events they send to other servers. This is important because the receiving server will +attempt to fetch those previous events if they are unaware of them, _rejecting the transaction_ if the sending server fails +to provide them. For this reason, it is not possible for servers to reliably deliver _other server's_ sticky events. + +In the common case, sticky events are sent over federation like any other event and do not cause any behavioural changes. +The two cases where this is different is: + - when sending sticky events to newly joined servers + - when sending "old" but unexpired sticky events + +Servers tend to maintain a sliding window of events to deliver to other servers e.g the most recent 50 PDUs. Sticky events +can fall outside this range, which is what we define as "old". On the receiving server, old events appear to have unknown +`prev_events`, which cannot be connected to any known part of the room DAG. Sending sticky events to newly joined servers can be seen +as a form of sending old but unexpired sticky events, and so this proposal only considers this case. Sending these old events +will potentially increase the number of forward extremities in the room for the receiving server. This may impact state resolution +performance if there are many forward extremities. Servers MAY send dummy events to remove forward extremities (Synapse has the +option to do this since 2019). Alternatively, servers MAY choose not to add old sticky events to their forward extremities, but +this A) reduces eventual delivery guarantees by reducing the frequency of transitive delivery of events, B) reduces the convergence +rate when implementing ephemeral maps (see "Implementing an ephemeral map"), as that relies on servers referencing sticky events from other servers. + ### Sync API changes The new `/sync` section looks like: @@ -417,6 +454,7 @@ This becomes particularly important when room state is rolled back. For example, then Bob kicks Charlie, but concurrently Alice kicks Bob then whether or not a receiving server would accept E would depend on whether they saw “Alice kicks Bob” or “Bob kicks Charlie”. If they saw “Alice kicks Bob” then E would be accepted. If they saw “Bob kicks Charlie” then E would be rejected, and would need to be rolled back when they see “Alice kicks Bob”. +[^origin]: That is, the domain of the sender of the sticky event is the sending server. [^newjoiner]: We restrict delivery of sticky events to ones sent locally to reduce the number of events sent on join. If we sent all active sticky events then the number of received events by the new joiner would be `O(nm)` where `n` = number of joined servers, `m` = number of active sticky events.