You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
matrix-spec-proposals/proposals/4297-state-resolution-v2_1.md

16 KiB

MSC4297: State Resolution v2.1

This MSC proposes two modifications to the existing state resolution algorithm which will improve security by reducing the frequency of "state resets". This proposal bases its changes on room version 11.

Background

Matrix is decentralised. This means there is no central entity which orders events. Ordering is critical to enforce access control. For example, in order for Bob to change the room name he needs to be a moderator/admin first. To model this, rooms are represented as a directed acyclic graph (DAG) of events. State events are operations like "Bob changing the room name" or "Bob gaining admin privileges". These events "point" to the most recent events that the server that created the event has seen in that room. In order for Bob to change the room name, his room name event MUST point either directly or indirectly to the event which gave him the right to change the room name. As Matrix is decentralised, Alice could independently demote Bob without him being aware of it at the same time he tries to change the room name. This is concurrent behaviour, and how this is managed is up to the state resolution algorithm. The algorithm:

  • selects which events are in conflict,
  • determines how to order these events,
  • filters out unauthorised events based on this ordering

For example, Alice's demotion may be applied first so Bob cannot change the room name.

However, the algorithm can cause surprising behaviour. If Alice cannot communicate Bob's demotion to Bob's server quickly, then Bob may perform many more privileged operations over minutes or hours. Eventually, Alice's demotion will arrive on Bob's server, which would cause all the operations Bob did to be "rolled back" or reverted. This can be unexpected to users on Bob's server, but is an expected consequence of Matrix being decentralised. A "state reset" is very similar to this in that it causes unexpected behaviour. However, the defining characteristic of a state reset is that this happens even when there is no revocation event such as a demotion.

Concurrent events can occur at any point in the room DAG. This means when the DAG is put into a total order from "oldest" to "newest", previously unseen events may appear at the "older" end of the ordering. These unseen events could affect whether later events are authorised or not, so we need to "replay" events from the unseen events to the latest events. To avoid replaying too many events, the algorithm intelligently calculates the difference between the sets of concurrent events to only replay what is necessary. It is desirable to prevent servers from adding concurrent events from an obviously long time ago, but since servers never coordinate (e.g via a consensus algorithm) they can never be sure that they have seen all concurrent events. The idea of making it impossible to add concurrent events to some sections of the graph is referred to as "causal stability" or "finality".

Matrix allows servers to partially synchronise the DAG. This allows servers that have been offline for a long time to quickly resynchronise without being forced to pull in the entire room history. To ensure authorisation events are applied correctly, events have another DAG formed of auth_events, called the "auth chain". These chains only consist of authorisation events and are fully synchronised. The auth_events cite all the historical events which authorise the event in question. Authorisation events are the following state events: m.room.create, m.room.member, m.room.power_levels and m.room.join_rules. A subset of authorisation events are power events.

[!NOTE] As a reminder, State Resolution v2.0 works by merging together sets of state across branches of the room DAG. State events common to both branches are called 'unconflicted'; state events which only exist on one branch or another or have different values are called 'conflicted'. The resolved state is calculated:

  • Start with the unconflicted state as our "base layer"
  • Consider the conflicted power events (PLs, kicks, bans, join_rules) and any conflicted authorisation events that are required to authorise said power events, use reverse topological ordering to provide a consistent ordering and then layer those by replaying them on top, incrementally authorising as you go.
  • Work out the sequence of the conflicted normal state events using mainline ordering (the 'backbone' of power level events through the conflicted set) and then similarly replay those on top too.
  • Reapply any unconflicted state keys which may have been overwritten in the previous steps.

Problems with the existing algorithm

The algorithm relies on two pieces of ordering information, from prev_events and auth_events. The prev_events ordering controls the input state sets to the algorithm. These state sets aren't guaranteed to map correctly onto the auth chain ordering induced by auth_events due to partial synchronisation. If these orderings disagree, the algorithm can select older state. This can happen due to federation outages or due to faulty implementations. The scenarios below require this to have happened, and are represented by "incorrect" state. This typically manifests as older state existing in a state set. State resets happen because the algorithm fails to determine all the events that need to be replayed when older state exists in a state set.

Problem A focuses on the "initial state" of the room, before the conflicted events are replayed. This is currently set to the "unconflicted" state of the room. The core idea is that if both forks agree on the exact event IDs for 99 members, but disagree on the exact event ID for the 100th member, we do not need to replay all 99 member events. The problem is that there is more than one way to "agree on the exact event IDs", which can cause state to reset under certain circumstances.

Problem B focuses on selecting which events are in conflict. The core idea is that we do not want to replay the entire history of the room every time there is a conflict, but only the differences between each fork (the auth difference). These differences don't include enough events under certain circumstances, causing state to reset.

[!NOTE] The following scenarios present rooms in two ways: via the familiar prev_events ordering which represents concurrent behaviour and the more unfamiliar auth_events ordering which represents "is authorised by" relationships. The auth_events graph contains many redundant edges (e.g every event references the create event), so we will only present the transitive reduction of this graph, which removes these edges. This helps illustrate the problems better.

The scenarios also use coloured diamond symbols to indicate the state of the room at a particular edge on the prev_events graph. Servers may incorrectly calculate this e.g due to partial synchronisation, which is indicated by a dashed arrow pointing from the correct state to the incorrect state. The colour of the diamond only serves to distinguish between other state sets, it has no other meaning, despite similar colours being used to indicate the sender of an event.

Problem A: Conflicting events can be unauthorised by the unconflicted state

Events can be conflicted in two ways: via room state's prev_events and via the auth chain's auth_events. The room state determines what the inputs to the state resolution algorithm are and hence what is in the unconflicted state. The auth chains determine which extra events are pulled in and the ordering between events. Room state can unexpectedly reset if these two orderings disagree, as outlined in the following example1:

In this scenario, the state of the room at the blue diamond is obtained via partial synchronisation, e.g /state{_ids}. This response contains an outdated join rules event, meaning both join rules events are now in conflict, even though all events on both forks agree on what the latest join rule event is via their auth chains. This causes the join rules to be replayed. However, both forks also agreed that Alice had left the room. As such, the unconflicted state starts with Alice not being a member in the room. When the join rules events get replayed, both fail since Alice (who set them) is not in the room. This causes the room to have no join rules event.

Problem B: Conflicting events need extra unconflicted events in order to be authorised

Similarly to Problem A, this occurs when the ordering between prev_events and auth_events differs. In this scenario, one fork can reference newer events via the auth chain whilst claiming the room state is an older event. When this happens, the auth difference does not include all relevant events between the old and new events as both sides have seen the events in-between via their auth chains. Most of the time this will not cause a state reset, but when there are chains of events which are dependent on one another, this can cause a state reset.

Note that the state of the room at the red diamond is obtained via partial synchronisation, e.g /state{_ids}. This response contains an outdated power levels event, meaning the power levels events are now in conflict.

In this example1, Alice is an Admin who promotes Bob. Bob then promotes Charlie. It is critical that Bob's promotion is applied before Charlie's promotion, or else it will be unauthorised. However, the auth difference calculation fails to include this event, leading to a state reset.

Proposal

Two modifications are made to the algorithm which are described below. Both changes relate to which events are selected for replaying. They do not modify how conflicted events are sorted nor do they modify the iterative auth checks.

Modification 1: Begin the first phase of iterative auth checks with an empty state map

This aims to fix problem A by disregarding the prev_events ordering entirely for determining the initial state. It does this by starting with an empty state map. This causes the iterative auth checks algorithm to load the auth chains, per the iterative auth checks definition:

If a (event_type, state_key) key that is required for checking the authorization rules is not present in the state, then the appropriate state event from the events auth_events is used if the auth event is not rejected.

This ensures that the algorithm replays events on top of the auth_events histories, rather than some unrelated history.

Step 2 of the state resolution algorithm is amended to state:

Apply the iterative auth checks algorithm, starting from the unconflicted state map an empty state map, to the list of events from the previous step to get a partially resolved state.

Note that even though we no longer insert the unconflicted state into the partially resolved state, Step 5 of the algorithm ensures that the unconflicted state is still in the final merged output, even though it may not have been during the resolution process:

Update the result by replacing any event with the event with the same key from the unconflicted state map, if such an event exists, to get the final resolved state.

Modification 2: Add events between the conflicted state set to the full conflicted set

This aims to fix problem B by including relevant intermediate events when performing state resolution. This modifies the definition of the "full conflicted set" to include all events which are a descendant of one conflicted event and an ancestor of another conflicted event. This forms a "conflicted subgraph" which is then replayed by the algorithm.

This means the full conflicted set contains:

  • the conflicted state events themselves AND
  • the auth difference AND
  • the events between the conflicted state events

The purpose of the auth difference is to replay the relevant auth history from each input state set. Most of the time it does this, but when the input state sets are derived from a partial sync response there's no guarantee that this will include the relevant history because the response may include erroneous older events. By including the conflicted state subgraph we ensure that input state sets with old events have the auth history from those old events replayed.

A new term is added in the state resolution algorithm:

Conflicted state subgraph. Starting from an event in the conflicted state set and following auth_events edges may lead to another event in the conflicted state set. The union of all such paths between any pair of events in the conflicted state set (including endpoints) forms a subgraph of the original auth_event graph, called the conflicted state subgraph.

And the following modification is made to the definition of "Full conflicted set":

Full conflicted set. The full conflicted set is the union of the conflicted state set, the conflicted state subgraph and the auth difference.

Potential issues

Performance

These modifications may impact performance in two ways:

  • more work is done on every state resolution to calculate the conflicted state subgraph.
  • potentially more events are replayed during resolution.

The data required to calculate the auth difference is also the same information required to calculate the conflicted state subgraph, so no extra database requests are needed with these changes. However, more CPU work needs to be performed to walk the auth chain to look for conflicted events on every resolution.

In the common case, the conflicted state subgraph overlaps entirely with the auth difference, meaning no extra events need to be replayed. This has been confirmed via partition/fault tolerance testing which tests extreme cases with large numbers of membership changes and federation outages, producing resolutions such as:

additional events replayed=0 num_conflicts=17 conflicted_subgraph=271 auth_difference=263

Further work

More changes are required in order to fix all cases of state resets. The changes proposed here are based on real world scenarios where state resolution has produced undesirable results. The underlying causes of these state resets is mismatched orderings. If the protocol had a single ordering then this would remove this entire class of issues. This will be explored in a future MSC.

Security considerations

The state resolution algorithm is a critical component in the overall security of a room. This proposal is modifying the algorithm so there are inevitable risks associated with it. These risks are mitigated because the proposal is not changing how events are ordered nor how events are authorised. It is purely adding events to be replayed and relying on the auth chains as the authoritative source to rebase changes onto.

Unstable prefix

This algorithm is in use for room version org.matrix.hydra.11.

Dependencies

This MSC has no dependencies.


  1. When used in conjunction with MSC4289 Alice should not be present in the m.room.power_levels event. The examples function in the same way. ↩︎