You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
matrix-spec-proposals/proposals/4124-server-auth-simple.md

18 KiB

MSC4124: Simple server authorization

This MSC proposes simple authorization rules that consider the origin server of a given event, with the aim of replacing m.room.server_acl.

This is a compromising MSC based on MSC4099 that tries to use concepts even more inline current authorization events This MSC was also created in reaction to MSC2870, that describes itself as stop gap to cover what the MSC has described short comings of m.room.server_acl. We also agree that MSC2870 is a stop gap and that the m.room.server_acl has severe shortcomings, but we take the view that after 4 years of proposed stop-gaping, there is enough time to introduce a more complete solution.

Related issues:

Context

NOTE: This context was added retroactively

Server Access Control Lists and their associated event, m.room.server_acl, are intended to control which servers can interact with a Matrix room

Problem: ambient room access

Overwhelmingly server ACL use in the federation is as a reactive measure to servers which are known to proliferate abuse. These servers are often added to the deny list, and it is the author's subjective understanding that the majority of public Matrix rooms use an allow list of ["*"]. For any room with an allow literal of ["*"], servers are given unrestricted authority to interact with the room, even poison the DAG, before a room administrator is even aware of the attacking server's existence1. As the the room admin is unaware of the existence of the attacking server, this means that they or their tooling will be unable to research the reputation of the server and determine whether to deny them until an attack is already underway, which is too late. This is why we are introducing the m.server.knock_rule later in this proposal. DO NOT assume this is comparable to membership knocking until you have read the proposal.

Problem: leaky servers

Server ACLs are specified as restricting which requests a server can make in relation to a room from the server-server API.

The ACLs are applied to servers when they make requests

Note: Server ACLs do not restrict the events relative to the room DAG via authorisation rules, but instead act purely at the network layer to determine which servers are allowed to connect and interact with a given room.

There is one notable exception to this, which is currently poorly specified, in that when a denied server uses /federation/v1/send, any PDUs that originate from the denied server are failed. Note, these PDUs are only failed when the denied server is the sender and origin of the transaction.

Server ACL is currently effective even when the denied server is not cooperative. However, if there is a distinct uncooperative server in the room, the uncooperative server can leak events from the denied server to all other servers, either by using leaked events as forward extremities or including them in responses to /get_missing_events.

Mitigating this problem is hard, because as the m.room.server_acl event is specified to restrict servers at the "request" level, there is no restriction on events themselves. If servers were to naively apply server ACL to events, they would likely soft fail all the events that the denied server sent throughout the room's history. This is because m.room.server_acl is intentionally not specified with any consideration of DAG semantics. We believe the intention of this is to stop a malicious server being able to add a boundless number of soft failed events to the DAG. Which we believe would be possible with current soft fail checks and a DAG based server access system like the one within this proposal.

It should be clear that servers can leak events to conforming implementations unintentionally, depending on whether the implementers have implemented server access control in a conforming way.

Problem: size limit

It is theorized by me completely from thin air that the server ACL event can contain roughly 512 entries before a limit to the event size is reached. A survey of #matrix-org-coc-bl:matrix.org suggests that most ACL events contain 150~ deny entries. However, there have been unstable periods in Matrix's history due to vulnerability of servers where the number of deny entries has been much greater. For example, Synapse previously had weak registration requirements by default that was exploited in a major incident. There could be a need to protect rooms from vulnerable servers again in the future.

Workaround for ambient room access: explicit allow list

An explicit allow list can be used to work around the issue of ambient room access. This means that only servers that are known to room admins can interact with the room.

Bootstrapping

This workaround has a bootstrapping problem, because any server, either well established or completely new will be unable to join a community unless they can contact room admins or their tooling out of band to request access.

For large public rooms with reputation, this is problematic as these out of band channels to request access are now the next target of abuse. Typically requesting access in this way requires manual intervention from the joining user and is slow.

Public Rooms

This work around represents a serious user experience issue for both the users and the room administrators public rooms, since vetting new users is time consuming for both parties.

Workaround for leaky servers: banning, kicking, redacting

A work around for leaky servers is falling back to the existing room level user controls such as banning, kicking and redacting. Though it is possible for room administrators to become overwhelmed if the leaking is part of an intentional attack. They could then change the join rule for the room, but this would provide a disruption to their service that outlives the attack.

Theorized workaround for leaky servers: leak detection

It has been suggested that it is possible to detect a leak by forensically analyzing the DAG in order to find which servers are accepting a denied server's events by analyzing which servers have used them as forward extremities.

The author does not consider this to be an acceptable workaround, because it is possible for a malicious leaking server to poison /get_missing_events and other endpoints to get spec conforming servers to accept events from denied servers without the malicious leaking server generating any suspicious PDUs themselves. In this instance, a naive leak detection tool would likely incriminate the wrong server.

Proposal

The m.server.knock authorization rule

This rule is be inserted after rule 3 in version 11, the check for m.room.create's content field m.federate.

  1. If the type is m.server.knock:
    1. If the state_key does not contain the server name for the origin server, reject.
    2. If there is any current state for the origin server's m.server.knock, reject.
    3. If the origin server's current participation is permitted, allow.
    4. If the m.server.knock_rule is deny, reject.
    5. If the origin server's current participation is deny, reject.
    6. Otherwise allow.

The purpose of this rule is to allow a server to send a knock event, even if the sender has no membership event.

The purpose of rule 1.2 is to prevent denied servers from ever being given the ability to craft any event whatsoever in a room that has always had the active m.server.knock_rule. This is because an m.server.participation event set to deny will usually be topologically older than an m.server.knock due to the m.server.participation usually referencing a recent m.room.power_levels event. And so m.server.knock events could be crafted by malicious servers without restriction without rule 1.2.

The m.server.participation authorization rule

This rule is to be inserted before rule 4 in version 11, the check for m.room.member, and after the m.server.knock rule described in this proposal.

  1. If the origin server's current participation state is not permitted:
    1. If the participation state is deny, reject.
    2. If the type is m.server.participation and the sender's origin server matches the state_key of the considered event:
      1. If the participation field of the considered event is not permitted, reject.
      2. If the sender is the same sender of m.room.create, then allow.
    3. If the m.server.knock_rule is deny, reject.
    4. If the m.server.knock_rule is anything other than passive, reject.

We allow the room creator to set their own participation and bypass the m.server.knock_rule provided their server has not been explicitly denied. This is because we want them to be able to set their participation at room creation without being unable to do when the m.server.knock_rule is active.

We allow senders to add the participation of their own server, provided that they only do so to permit their own server (and not deny themselves as a foot gun). This is useful in cases where a room has a passive m.server.knock_rule and the room admins need to explicitly permit their own servers before changing the knock rule to active.

The m.server.participation authorization event, state_key: ${origin_server_name}

This is an authorization event that is used to authorize events originating from the server named in the state_key.

participation can be one of permitted or deny. participation is protected from redaction.

A denied server must not be sent a m.server.participation event unless the targeted server is already present within the room, or it has an existing m.server.knock event. This is to prevent malicious servers being made aware of rooms that they have not yet discovered.

A reason field can be present alongside participation in order to explain the reason why a server has been denied. This reason is to be shown to a joining, or previously present server, so that the server's users can understand why they are not being allowed to participate.

The m.server.knock_rule event, state_key: ''

This event has one field, rule which can be one of the following:

  • deny: Users are unable to send the m.server.knock event unless there is an existing m.server.participation event for the server.
  • passive: Users can send the m.server.knock event without corresponding membership or server participation.
  • active: Users can send the m.server.knock event but cannot send any other event without a corresponding participation of permitted.

rule is protected from redaction.

The passive state allows for rooms to operate as they do today, new servers can freely join a room and start sending events without prior approval from the administrators

The active state allows for a much safer way to run public Matrix rooms, new servers can join a room, send the m.server.knock event but cannot do more until a room administrator permits the new joiner with an m.server.participation event. We expect that in practice automated tooling will perform a simple reputation check and immediately permit a new server to participate. This is an essential part of the proposal as the active mechanism eliminates a current shortfall that m.room.server_acl is a purely reactive tool in a join wave attack.

The m.server.knock event, state_key: ${origin_server_name}

This event has no fields, because it can only be sent once, and therefore cannot be edited if the wrong or malicious information is provided.

The intent of the event is to only let the room administrators explicitly aware of the server's existence.

The make_server_knock handshake

This MSC requires a very simple clone of the make_knock handshake for the purpose of signing and creating the m.server.knock event.

The details of this handshake are left outside the scope of the MSC, as it may be decided that an API providing an agnostic unification of make_knock and make_join should be used instead that signs both the membership event and the m.server.knock event templates.

We believe that the open choice here should not alone be a reason to block this MSC from consideration. But we will follow up with a clone of the make_knock handshake if requested.

Diagrams

These diagrams do not specify any behaviour and are provided only to help explain the proposal. Do not write any part of the specification based upon the diagram, only what you have read from the authorization rules.

Room creation flow

---
title: Room creation as of v1.10
---
stateDiagram-v2
    create: Alice creates the room
    aliceMembership: Alice joins the room
    create --> aliceMembership
    alicePL: Alice sends the default power levels event from the createRoom template
    aliceMembership --> alicePL
    aliceJoinRules: Alice sets the join rules
    alicePL --> aliceJoinRules
---
title: Room creation with simple server authorization
---
stateDiagram-v2
    create: Alice creates the room
    aliceServerParticipation: Alice permits her server to participate
    aliceMembership: Alice joins the room
    create --> aliceServerParticipation
    aliceServerParticipation --> aliceMembership
    alicePL: Alice sends the default power levels event from the createRoom template
    aliceMembership --> alicePL
    aliceJoinRules: Alice sets the join rules
    alicePL --> aliceJoinRules
    aliceKnockRule: Alice sets the server knock rule
    alicePL --> aliceKnockRule

Join flow

This explains the order of events for joining the room, it does not explain how the make_join handshake is amended.

---
title: Joining the room with "passive" knock rule
---
stateDiagram-v2
    create: Alice creates the room and sets the knock rule to "passive"
    bobJoin: Bob sends a membership event with membership join
    create --> bobJoin
    bobHey: Bob sends m.room.message "Hello!"
    bobJoin --> bobHey
    state aliceChoice <<choice>>
    aliceChoiceText: Alice decides whether to explicitly set Bob's participation
    bobHey --> aliceChoiceText
    aliceChoiceText --> aliceChoice
    aliceDeny: Alice sets Bob's participation to "deny"
    alicePermit: Alice sets Bob's participation to "permitted"
    aliceImplicit: Alice does not make a decision
    aliceChoice --> aliceDeny
    aliceChoice --> alicePermit
    aliceChoice --> aliceImplicit
    bobCoolRoom: Bob sends m.room.message "This is a cool room!"
    alicePermit --> bobCoolRoom
    aliceImplicit --> bobCoolRoom
---
title: Joining the room with "active" knock rule
---
stateDiagram-v2
    create: @alice#colon;matrix.org creates the room and sets the knock rule to "passive"
    bobKnock: @bob#colon;example.com sends a knock event for his server#colon; example.com
    create --> bobKnock
    state aliceChoice <<choice>>
    aliceChoiceText: @alice#colon;matrix.org decides whether to explicitly set example.com's participation
    bobKnock --> aliceChoiceText
    aliceChoiceText --> aliceChoice
    aliceDeny: @alice#colon;matrix.org sets example.com's participation to "deny"
    alicePermit: @alice#colon;matrix.org sets example.com's participation to "permitted"
    aliceImplicit: @alice#colon;matrix.org does not make a decision
    aliceChoice --> aliceDeny
    aliceChoice --> alicePermit
    aliceChoice --> aliceImplicit
    bobJoin: @bob#colon;example.com sends a membership event with membership join
    bobHello: @bob#colon;example.com sends m.room.message "Hello!"
    bobCannotParticipate: @bob#colon;example.com cannot participate and example.com <br>  cannot craft any authorizable event anywhere in the DAG
    aliceDeny --> bobCannotParticipate
    aliceImplicit --> bobCannotParticipate
    alicePermit --> bobJoin
    bobJoin --> bobHello

Potential issues

Racing with m.server.knock_rule?

We will embed m.server.knock_rule in m.room.create if it someone raises concerns about a potential race condition or other issue about this conflicting with m.server.participation. However, stating that there might be without elaboration is not helpful, I'd need to know how the race works. If there is insistence, then we will embed within the m.room.create event.

Changing the m.server.knock_rule from passive to active or deny

Server admins can unintentionally lock themselves out of their room unless they are the room creator under the current proposal.

Soft failure of messages

Servers that had participation of permitted that are later denied via deny, can have some of their messages soft failed while the forks synchronise similar to https://github.com/matrix-org/synapse/issues/9329.

This could be addressed with MSC4104.

Mismatch with m.room.power_levels

There is an argument to be made that the ability to manage m.server.participation should not be flat in the way that m.room.server_acl is. Consider Alice being the room creator and Bob being an admin. Bob could create an m.server.participation event that denies Alice's server from participating, even if Alice is the same or a higher power level.

Alternatives

  • MSC4099 Participation based authorization for servers in the Matrix DAG
  • MSC3953 Server capability DAG

Security considerations

None considered.

Unstable prefix

me.marewolf.msc4124.*

Dependencies

No direct dependencies See make_server_knock handshake.


  1. It is possible to craft and send events initially to servers that room admins do not reside on. Though most rooms are probably vulnerable to less than this, and somewhat simnple spam join attacks. ↩︎