pull/4318/head
Hugh Nimmo-Smith 12 months ago
parent f97e4e64cd
commit b626d512f0

@ -1,57 +1,101 @@
# MSC4143: MatrixRTC
This MSC defines the modules with which the MatrixRTC (Matrix Real Time Communication) signalling system is built.
MatrixRTC is short for Matrix real time communication.
This MSC defines the modules with which the Matrix real time system is built.
The MatrixRTC specification is separated into different modules.
MatrixRTC specifies how a real time session is described in a room and how matrix users can connect to
a session.
- The MatrixRTC room state that defines the state of the real time application.\
The MatrixRTC specification is separated into different modules:
- The MatrixRTC room state that defines the state of the real time session.\
It is the source of truth for:
- Who is part of a session
- Who is connected via what technology/backend
- Metadata per device used by other participants to decide whether the streams
from this source are of interest / need to be subscribed.
- The RTC backend.
- The MatrixRTC backend.
- Allows for multiple backend implementations to be used.
- It defines how to discover the available backend(s).
- It defines how to connect the participating peers.
- Livekit is the standard for this as of writing.
- Defines how to connect to a server/other peers, how to update the connection,
how to subscribe to different streams...
- Another planned backend is a full mesh implementation based on MSC3401.
- The RTCSession types (application) have their own per application spec.
- Calls can be done with an application of type `m.call` see (TODO: link call msc)
- A proposal utilising LiveKit is the standard for this as of writing.
- Another planned backend is a full mesh implementation based on [MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401).
- The MatrixRTC application.
- Each application type can have it's own spec.
- Voice and video conferencing can be done with an application of type `m.call`
- The application defines all the details of the RTC experience:
- How to interpret the metadata of the member events.
- What streams to connect to.
- What data in which format to sent over the RTC channels.
- What MatrixRTC backends are supported.
- End-to-end encryption of media streams
This MSC will focus on the Matrix room state, which can be seen as the most high
level signalling of a call:
This MSC will focus on the Matrix room state which is responsible for the high
level signalling of a RTC session:
## Proposal
Each RTC session is made out of a collection of `m.rtc.member` state events.
Each `m.rtc.member` event defines the application type: `application`
and a `call_id`.
The first element of the state key is the `userId` and the second the `deviceId`.
(see [this proposal for state keys](https://github.com/matrix-org/matrix-spec-proposals/pull/3757#issuecomment-2099010555)
for context about second/first state key.)
Each RTC session is made out of a collection of `m.rtc.member` room state events.
Each `m.rtc.member` event defines who (the `member`) is a participant of which session (the `session`).
### The MatrixRTC room state
Everything required for working MatrixRTC
All data related to a MatrixRTC session
(current session, sessions history, join/leave events, ...) only
require one event type.
requires one event type.
A complete `m.rtc.member` state event looks like this:
(current session, sessions history, join/leave events, ...) only
require one event type:.
```json5
We use a set of `m.rtc.member` (one for each participant) state events to represent a session.
based on the content a `m.rtc.member` state event can either represent a connected or a disconnected member.
#### Joining a session
Sending a well-formed `m.rtc.member` event that describes a connected state for a state key that is not yet used or contains a disconnected `m.rtc.member` event represents a join action.
The fields are as follows:
- `member` required object - describes the participant of the RTC session:
- `id` required string - a unique identifier for this session membership as defined above. Recommended to be a UUID. It can be reused if the user leaves and rejoins the session.
It should be unique across all devices of the user. TODO: define grammar
- `device_id` required string - the Matrix device ID of the device that is joining the session. This is used when sending
[to-device messages](https://spec.matrix.org/v1.11/client-server-api/#send-to-device-messaging).
- `user_id` required string - the Matrix user ID of the user that is joining the session. This is needed as we cannot rely
on the owner of state event as it might have been modified by an admin or similar.
- `session` required object - an object that is used to uniquely identify this session across RTC member events
of the Matrix room:
- `application` required string - a recognised application type. e.g. `m.call` as linked below
- additional fields as defined by the application type
- `created_ts` - timestamp in milliseconds since UNIX epoch.
- this should **not** be present the first time that the `m.rtc.member` event is sent.
- if the `m.rtc.member` event is sent again, the `created_ts` should be populated with the `origin_server_ts`
that was given to the previous version of the state event.
- `focus_active` required Focus object - specifies the algorithm that defines how to choose a Focus for this member. See below for details.
- `foci_preferred` required array of Focus objects - specifies the input data for this algorithm contributed by this member. See below for details.
Additional fields may be added depending on the application type.
A full `m.rtc.member` state event for a joined member looks like this:
```json
// event type: "m.rtc.member"
// event key: "@user:matrix.domain_DEVICEID"
// state key: see next section for definition
{
"application": "m.my_session_type",
"call_id": "",
"device_id": "DEVICEID",
"session": {
"application": "m.call"
// further fields for the application
},
"member": {
"id": "xyzABCDEF10123",
"device_id": "DEVICEID",
"user_id": "@user:matrix.domain"
},
"created_ts": Time | undefined,
"expires_after": Duration,
"focus_active": {...FOCUS_A},
"foci_preferred": [
{...FOCUS_1},
@ -60,19 +104,9 @@ A complete `m.rtc.member` state event looks like this:
}
```
> [!NOTE]
> This relies on [MSC3757](https://github.com/matrix-org/matrix-spec-proposals/pull/3757).
> We need to have one state event per device, hence multiple "non-overwritable" state
> events per user.
This gives us the information, that user: `@user:matrix.domain` with device `DEVICEID`
is part of an RTCSession of type `m.call` in the scope/sub-session `""` (empty
string as call id) connected over `FOCUS_A`. This is all information that is needed
for another room member to detect the running session and join it.
We include the device_id in the member content to not rely on the exact format of the state key.
In case [MSC3757](https://github.com/matrix-org/matrix-spec-proposals/pull/3757) is used it would not
be the second element of the state key array.
This gives us the information, that user: `@user:matrix.domain` with member ID `DEVICEID_m:call_123456789`
is part of a session identified by `{}` using application of type `m.call` connected over `FOCUS_A`.
This is sufficient information for another room member to detect the running session and join it.
`created_ts` is an optional property that caches the time of creation. It is not required
for an event that, has not yet been updated, there the `origin_server_ts` is used.
@ -80,56 +114,120 @@ for an event that, has not yet been updated, there the `origin_server_ts` is use
> [!NOTE]
> We introduce `created_ts()` as the notation for `created_ts ?? origin_server_ts`
Once the event gets updated, the origin_server_ts needs to be copied into the `created_ts` field.
Once the event gets updated the `origin_server_ts` needs to be copied into the `created_ts` field.
An existing `created_ts` field implies that this is a state event updating the current session
and a missing `created_ts` field implies that it is a join state event.
All membership events that belong to one member session can be grouped with the index
`created_ts()`+`device_id`. This is why the `m.rtc.member` events deliberately do NOT include a `membership_id`.
`created_ts()`+`state_key`. This is why the `m.rtc.member` events deliberately do NOT include something akin to a `membership_id`.
Other then the membership sessions, there is **no event** to represent a rtc session (containing all members).
Such an event would include shared information, and deciding who has authority over that is not trivial.
Other then the membership sessions, there is **no event** to represent a RTC session (containing all members).
This event would include shared information where it is not trivial to decide who has authority over it.
Instead the session is a computed value based on `m.rtc.member` events.
The list of events with the same `application` and `m.call_id` represent one session.
This array allows to compute fields such as participant count, start time, etc.
Sending an empty `m.rtc.member` event represents a leave action.
Sending a well formatted `m.rtc.member` represents a join action.
The list of events with the same `session` content represent one session.
This array allows to compute fields like participant count, start time etc.
Based on the value of `application`, the event might include additional parameters
required to provide additional session parameters.
to provide additional session parameters.
> A [thirdroom](https://thirdroom.io)-like experience could include the information of an approximate position
> A [Third Room](https://thirdroom.io) like experience could include the information of an approximate position
> on the map, so that clients can omit connecting to participants that are not in their
> area of interest.
#### State key for `m.rtc.member`
The state key is generated from the `member` field of the `m.rtc.member` event.
We want to choose a state key that is compatible with whichever state protection proposal is accepted to ensure that
users cannot modify one another's sessions.
For [MSC3757](https://github.com/matrix-org/matrix-spec-proposals/pull/3757) we generate the state key by
concatenating the following strings:
- the Matrix ID of the user
- an `_` (underscore)
- the `member`.`id` field
For example with a `member`.`id` of `xyzABCDEF10123` for user `@user:matrix.domain` the state key would be `@user:matrix.domain_xyzABCDEF10123`.
For a client parsing the state key we would treat anything before the first `_` as the Matrix ID of the user
and anything after as the `member`.`id` field.
#### Leaving a session
Sending an empty `m.rtc.member` event represents a leave action. The state key must be the same as boefore
There is an optional `leave_reason` field that can be used to provide a reason for leaving the session:
- `leave_reason` optional string - one of: `lost_connection`
An example of leaving a session where the user explicitly disconnects:
```json
// event type: "m.rtc.member"
// state key: "@user:matrix.domain_xyzABCDEF10123"
{
}
```
The client should use the `prev_content` field of the [room state event](https://spec.matrix.org/v1.11/client-server-api/#room-event-format)
to determine the details of the leave event.
For example:
```json
// event type: "m.rtc.member"
// state key: "@user:matrix.domain_xyzABCDEF10123"
{
"content": {
"leave_reason": "lost_connection"
},
"prev_content": {
"session": {
"application": "m.call",
"call_id": ""
},
"member": {
"id": "xyzABCDEF10123",
"device_id": "DEVICEID",
"user_id": "@user:matrix.domain"
},
"created_ts": 123456,
"focus_active": {...FOCUS_A},
"foci_preferred": [
{...FOCUS_1},
{...FOCUS_2}
]
}
}
```
#### Reliability requirements for the room state
Room state is a very well suited place to store the data for a MatrixRTC session, as
it allows:
Room state is a very well suited place to store the data for a MatrixRTC session.
It allows:
- The client to determine current ongoing sessions without loading history for every room,
or doing additional work other than the sync loop that needs to run anyway.
- The client to determine current ongoing sessions without loading history for every room.
Or doing additional work other then the sync loop that needs to run anyways.
- The client can compute/access data of past sessions without any additional redundant data.
- Sessions (start/end/participant count) are federated and there is not redundant data storage that
could result in conflicts, or can get out of sync. The room state events are part of the dag and this
is solved like any other PDU in matrix.
could result in conflicts, or can get out of sync. The room state events are part of the DAG and this
is solved like any other Persistent Data Unit (PDU) in Matrix.
A challenge with using the room state to represent a session is disconnection behaviour.
If the client disconnects from a call because of a network issue,
an application crash, or a user forcefully quitting the client - then the room state cannot be updated any more.
However, a challenging circumstance with using the room state to represent a session is
the disconnection behaviour. If the client disconnects from a call because of a network issue,
an application crash or a user forcefully quitting the client, the room state cannot be updated anymore.
The client is required to leave by sending a new empty state which cannot happen once connection is lost.
If the state is not updated correctly we end up with incorrect session end timestamps, and a room state that is not
If the state is not updated correctly we end up with a room state that is not
correctly representing the current RTC session state. Historic and current MatrixRTC session data would be broken.
For an acceptable solution, the following requirements need to be taken into consideration:
- Room state is set to empty if the client loses connection. (A heardbeat like system is desired)
- Room state is set to empty if the client looses connection. (A heartbeat like system is desired)
- The best source of truth for a call participation is a working connection to the SFU.
It is desired that the disconnect of the SFU is connected to the room state.
- It should be possible to update the room state without the client being online.
- All of this should still work when Matrix uses cryptographic identities (e.g.
[MSC4080](https://github.com/matrix-org/matrix-spec-proposals/pull/4080)).
It is desired that the disconnect of the member on the SFU gets propagated to the room state.
- It should be possible to updated the room state without the client being online.
- All this should be compatible when Matrix uses cryptographic identities.
[MSC4140](https://github.com/matrix-org/matrix-spec-proposals/pull/4140) proposes a concept to
delay the leave events until one of the leave conditions (heartbeat or SFU disconnect) occur
@ -137,142 +235,382 @@ and fulfil all of the these requirements.
A MatrixRTC client has to first send/schedule the following delayed leave event:
```json5
```json
// event type: "m.rtc.member"
// event key: "@user:matrix.domain_DEVICEID"
// state key: "@user:matrix.domain_xyzABCDEF10123"
{
"leave_reason": "CONNECTION_LOST"
"leave_reason": "lost_connection"
}
```
Subsequently, the actual state event can be sent, so that we guarantee that the state will be empty eventually.
only after that the actual state event can be sent, so that we guarantee that the state will be empty eventually.
The `leave_reason` is added so clients can be more verbal about why a user disconnected from a call.
It allows to communicate with other participants in a session if the user has disconnected intentionally or lost connection.
Receiving clients will be able to detect if the delayed event request was recognised by the presence of the `has_delayed_overwrite: true`
unsigned property. If the property is missing the event is invalid.
This also ensures that delayed leave events that are incorrectly sent with a non-empty membership content are invalidated,
as they will not contain the `has_delayed_overwrite: true` unsigned property.
#### Historic sessions
#### Session history
Since there is no single entry for a historic session (because of the ownership ambiguity),
historic sessions need to be computed on the client.
Since there is no single entry for a historic session (because of the ownership discussion),
historic sessions need to be computed and most likely cached on the client.
Each state event can either mark a join or leave:
- join: `prev_state.application != current_state.application` &&
`prev_state.m.call_id != current_state.m.call_id` &&
`current_state.application != undefined`
(where an empty `m.rtc.member` event would imply `state.application == undefined`)
- leave: `prev_state.application != current_state.application` &&
`prev_state.m.call_id != current_state.m.call_id` &&
`current_state.application == undefined`
- join: `prev_state.session != current_state.session` &&
`current_state.session != undefined`
(where an empty `m.rtc.member` event would imply `state.session == undefined`)
- leave: `prev_state.session != current_state.session` &&
`current_state.session == undefined`
Based on this one can find user sessions. The range between a join and a leave
event gives the specific times and duration of the session.
The collection of all overlapping user sessions with the same `call_id` and
`application` define one MatrixRTC history event.
Based on this one can find user sessions. (The range between a join and a leave
event) of specific times.
The collection of all overlapping user sessions with the same `session` contents
define one MatrixRTC history event.
### The RTC backend
`foci_active` and `foci_preferred` are used to communicate:
Backend **infrastructure** in this context can be anything that can serve as the backend for a
MatrixRTC session. In most cases this is a SFU. But also a full mesh implementation could
be an infrastructure. Not all kind of infrastructure require a way of sourcing a backend resource
(e.g. full-mesh). In this MSC we only refer to infrastructure where it is necessary to have access to additional
data to participate in the MatrixRTC session.
- how a user is connected to the session (`foci_active`)
- what connection method this user knows about would like to connect with.
The backend is referred to as a Focus or as Foci in plural.
The only enforced parameter of a `foci_preferred` or `foci_active` is `type`.
Based on the focus type a different amount of parameters might be needed to,
communicate how to connect to other users.
`foci_preferred` and `foci_active` can have different parameters so that it is,
possible to use a combination of the two to figure our that everyone is connected
with each other.
Note that these backends are independent of the application (e.g. `m.call`) being used in the session.
A Focus is represented as a JSON object with one mandatory field:
- `type` required string: The type of the Focus as defined by an RTC backend..
Additional fields will be present depending on `type`.
Only users with the same type can connect in one session. If a frontend does
not support the used type they cannot connect.
Each focus type will get its own MSC, describing how to get from the foci
information to establishing WebRTC connections for all participants.
- [`livekit`](www.example.com) TODO: create `livekit` focus MSC and add link here.
- [`full_mesh`](https://github.com/matrix-org/matrix-spec-proposals/pull/3401)
TODO: create `full-mesh` focus MSC based on[MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401)
and add link here.
#### Sourcing `foci_preferred`
At some point participants have to decide/propose which focus they use.
Based on the focus type and use case choosing a `foci_preferred` can be different.
If possible these guidelines should be obeyed:
- If there is a relation between the `focus_active` and a preferred focus (`type: livekit` is an example for this)
it is recommended to copy _the preferred focus that relates to the current `focus_active`_ of other participants to
the start of the `foci_preferred` array of the member event.
(The exact definition of: _the preferred focus that relates to the current `focus_active`_ is part of the
specification for each focus type. For `full_mesh` for example there is no such thing as: _the preferred focus that
relates to the current `focus_active`_ )
- Homeservers can proposes `preferred_foci` via the well known. An array of preferred foci is provided behind the well
known key `m.rtc_foci`. This is defined in [MSC4158](https://github.com/matrix-org/matrix-spec-proposals/pull/4158).
They are related and it is recommended to also read
[MSC4158](https://github.com/matrix-org/matrix-spec-proposals/pull/4158) with this MSC.
Those proposals from **your own** homeserver should come next in the `foci_preferred` list of the member event.
- Clients also have the option to configure a preferred foci even though this is not recommended (see below).
Those come last in the list.
The rationale for these guidelines are:
- It is always desired to have as few focus switches as possible.
That is why the highest priority is to prefer the focus that is already in use.
- MatrixRTC is designed around the same architecture as the rest of Matrix, with
conversations being powered by many homeservers from across the network.
MatrixRTC has the same goal. To achieve a stable and healthy ecosystem
RTC infrastructure should be thought of as a part of a homeserver. It is very similar
to a turn server: mostly traffic and little cpu load.
To not end up in a world where each user is only using one central SFU but where the traffic
is split over multiple SFU's it is important that we leverage the SFU distribution similarly to the
distribution of homeservers.
For this reason the second guideline is to lookup the preferred foci from the homeserver's well_known.
- Looking up the preferred foci from a client is toxic to a federated system. If the majority of users
decide to use the same client all of the users will use one focus. This destroys the passive security mechanism that
each instance is not an interesting attack vector since it is only a fraction of the network.
Additionally it will result in poor performance if every user on Matrix would use the same focus.
There are cases where this is acceptable:
- Transitioning to MatrixRTC. Here it might be beneficial to have a client that has a fallback focus
so calls also work with homeservers not supporting it.
- For testing purposes where a different focus should be tested but one does not want to touch the .well_known
- For custom deployments that benefit from having the Focus configuration on a per client basis instead of per homeserver.
### The RTC Session types (application)
Each session type can have its own specification in how the different streams
are interpreted and even what focus type to use. This makes this proposal extremely
flexible. For instance, a Jitsi conference could be added by introducing a new `application`
and a new focus type and would be MatrixRTC compatible. It would not be compatible
with applications that do not use the Jitsi focus but clients would know that there
is an ongoing session of unknown type and unknown focus and could display/represent
Each Focus type will get its own MSC in which the detailed procedure to get from
the foci information to working WebRTC connections to the streams of all the
participants is explained.
Foci are represented in three places:
- `focus_active` of `m.rtc.member` state event - specifies the algorithm that defines how to choose a Focus for this member.
- `foci_preferred` of `m.rtc.member` state event- specifies the input data for this algorithm contributed by this member.
- `m.rtc_foci` of the `.well-known/matrix/client` - specifies the list of available Foci for the homeserver.
The `focus_active` algorithm needs to be designed so that all participants converge to the same SFU/Focus.
The following Focus `type` values are defined:
- `livekit` - a backend using the [LiveKit](https://livekit.io/) SFU as described in
[MSC4195](https://github.com/matrix-org/matrix-spec-proposals/pull/4195).
- `full_mesh` - a backend using a full-mesh approach based on [MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401).
#### Choosing the value of `foci_preferred` for the `m.rtc.member` state event
At some point session participants have to decide/propose which Focus they will use.
Based on the Focus type and application choosing the method by which the contents of the `foci_preferred` field on the `m.rtc.member`
can be different.
There are three guidelines which should be obeyed by a client when building the `foci_preferred` list:
1. It is always desired to have as few Focus switches as possible.
If there are other participants on the session (i.e. other `m.rtc.member` events) the client should calculate what the Focus it should connect to
based on the `m.rtc.member` events for the existing participants.
This should happen reactively on each `m.rtc.member` state event change.
Each MatrixRTC frontend is responsible that it can deal with focus switches based on changing state gracefully. It is part of the design of MatrixRTC and a requirement for a eventually consistent distributed system.
The calculated Focus should then be present at the start of the `foci_preferred` list.
2. The client should lookup the suggested foci from the homeserver `.well-known/matrix/client` as defined below.
MatrixRTC is designed around the same culture that makes Matrix possible: A large amount of infrastructure in the form of homeservers is provided by the users.
To achieve a stable and healthy ecosystem backend RTC infrastructure should be thought of as a part of a homeserver.
It is very similar to a TURN server: mostly traffic and little CPU load.
To not end up in a world where each user is only using one central SFU but where the traffic
is split over multiple SFU's it is important that we leverage the SFU distribution on the
homeserver federation.
These proposals from **your own** homeserver should come next in the `foci_preferred` list of the member event.
3. Clients should not use a hard-coded Focus.
Looking up the preferred Foci from a client is toxic to a federated system. If the majority of users
decide to use the same client all of the users will use one Focus. This destroys the passive security mechanism, that
each instance is not an interesting attack vector since it is only a fraction of the network.
Additionally it will result in poor performance if every user on Matrix would use the same Focus.
However, there are cases where this is acceptable:
- Transitioning to MatrixRTC. Here it might be beneficial to have a client that has a fallback Focus
so calls also work with homeservers not supporting it.
- For testing purposes where a different Focus should be tested but one does not want to touch the .well-known
- For custom deployments that benefit from having the Focus configuration on a per client basis instead of per homeserver.
Therefore, if a client does use a hard-coded Focus it should come last in the `foci_preferred` list.
#### Discovery of Foci using `.well-known/matrix/client`
> [!NOTE]
> Backend **infrastructure** in this context can be anything that can serve as the backend for a
> MatrixRTC session. In most cases this is a SFU. But also a full mesh implementation could
> be an infrastructure. Not all kind of infrastructure require a way of sourcing a backend resource
> (e.g. full-mesh). In this MSC we only refer to infrastructure where it is necessary to have access to additional
> data to participate in the MatrixRTC session.
We use a `m.rtc_foci` key in the homeserver `.well-known/matrix/client` that can be used to expose
a sorted (by priority) list of Focus description objects.
For example in generic form:
```json
{
"m.rtc_foci": [
{
"type": "some-focus-type",
"additional-type-specific-field": "https://my_focus.domain",
"another-additional-type-specific-field": ["with", "Array", "type"]
}
]
}
```
Or a concrete example for a `livekit` Focus:
```json
{
"m.rtc_foci": [
{
"type":"livekit",
"livekit_service_url":"https://livekit-jwt.call.element.io"
}
]
}
```
### The RTC application types
Each application type might have its own specification in how the different streams
are interpreted and even what Focus type to use. This makes this proposal extremely
flexible. A Jitsi conference could be added by introducing a new `application`
and a new Focus type and would be MatrixRTC compatible. It would not be compatible
with applications that do not use the Jitsi Focus but clients would know that there
is an ongoing session of unknown type and unknown Focus and could display/represent
this in the user interface.
To make it easy for clients to support different RTC session types, the recommended
approach is to provide a Matrix widget for each session type, so that client developers
can use the widget as the first implementation if they want to support this RTC
session type.
To make it easy for clients to support different application types, the recommended
approach is to provide a Matrix widget for each application type. This way the
client developers can use the widget as the first implementation if they want to
support this RTC application type.
Each application should get its own MSC in which the all the additional
fields are explained and how the communication with the possible foci is
defined:
- [`m.call`](www.example.com) TODO: create `m.call` MSC and add link here.
- `m.call` - voice and video conferencing described by [MSC4196](https://github.com/matrix-org/matrix-spec-proposals/pull/4196).
#### Interoperability between applications
There is a use-case in which a `m.call` app might want to participate in a session of type (application) `custom-call-with-more-features`. A native mobile matrix client might support `m.call` and is at hand to join the feature rich application/session.
There could be fallback mechanisms but the most flexible approach is to treat it per application type. If it makes sense for an application type to fully conform to `m.call` a client that can connect to an `m.call` RTC session (application) could claim that it is also compatible with `custom-call-with-more-features` . It is than the job of the `custom-call-with-more-features` session type (application) to define some kind of feature list so that it can tell if users are joining with an m.call client or a dedicated `custom-call-with-more-features` client.
### End-to-end encryption of media streams
We define how the key material is shared between the participants of the call to facilitate end-to-end encryption of the media streams.
The backend (e.g. LiveKit) MSC defines how the key material is actually used.
#### Shared password
A shared password may be used to encrypt the media streams sent via the RTC backend that has been distributed ahead of time to the participants.
For example, it could be in the query parameter of a private URL attached to a calendar invitation.
#### Per-participant sender key
A participant can share it's chosen key with other participants by sending Matrix [to-device messaging](https://spec.matrix.org/v1.11/client-server-api/#send-to-device-messaging) to the other participants.
The key is sent as an event of type `m.rtc.encryption_keys` as an encrypted to-device message.
The device ID that is being sent to is the `member`.`device_id` from the `m.rtc.member` events.
The event contains the following fields:
- `session` required object: The contents of the `session` from the `m.rtc.member` event.
- `member` required object: The contents of the `member` from the corresponding `m.rtc.member` event.
- `keys` required array of objects: The sender keys to be distributed to the participant:
- `key` required string: The base64 encoded key material.
- `index` required int: The index of the key to distinguish it from other keys. This must be a between 0 and 255 inclusive.
In some implementations of MatrixRTC this may correspond to the `keyID` field of the WebRTC [SFrame](https://www.w3.org/TR/webrtc-encoded-transform/#sframe) header.
- `invalidates_key_index` optional int: The index of the key that is invalidated by this key. If this is set, the application should invalidate the key identified
by `invalidates_key_index` once it receives a frame with the new `index`. This is to protect against an exfiltrated key being used to forge frames.
- `invalidates_after_ms` optional int: The number of milliseconds after the key identified by `invalidates_key_index` is invalidated by this key even if no frames
are received. Again, this is to protect against an exfiltrated key being used to forge frames.
Depending on the RTC application, additional fields may be added to this event.
An example to-device event:
```json5
// event type: "m.rtc.encryption_keys"
{
"session": {
"application": "m.call",
"call_id": "",
"scope": "m.room"
},
"member": {
"id": "xyzABCDEF10123",
"device_id": "DEVICEID",
"user_id": "@user:matrix.domain"
},
"room_id": "!roomid:matrix.domain",
"keys": [
{
"index": 10,
"key": "base64encodedkey",
"invalidates_key_index": 9,
"invalidates_after_ms": 5000
},
],
}
```
On receipt of the `m.rtc.encryption_keys` event the application can associate the received key with the RTC session by matching the `session` and `member` contents with the corresponding `m.rtc.member` event.
When the application joins the session it should send the key to all the existing participants.
To ensure forward secrecy and post compromise security, the key material should be rotated (i.e. a new key generated) when a participant joins or leaves the session.
Key rotation is done as follows:
- the sending application generates the new key material for the participant.
- the sending application sends the new key material to all the participants with a new `index` value and `invalidates_key_index` set to the current `index`.
- the receiving application stores the new key material for the specified `index`.
- the sending application continues to use the old/current key to encrypt media.
- the sending application waits for a period of time. The default should be 3 seconds.
It is possible to overwrite this on a per application basis in case an application has specific requirements on security or wants to minimize missed stream data.
Also negotiation approaches can be defined where the RTC application uses data channels to communicate if everyone has received the next key.
- the sending application starts to use the new key to encrypt media.
- the receiving application invalidates the existing key with the `invalidates_key_index` value.
### Discovery/negotiation of application types
Problem: If a user wants to make a call to a user or room, then which call/application options should the client present to the user?
This should also take account of non-MatrixRTC calling: legacy 1:1 VoIP, room state widget for Jitsi.
TODO: write up notes.
## Potential issues
## Alternatives
### One state event per user
[MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401) proposed to have one state event per user with that state event containing an array of memberships.
This introduces two problems:
- potential inconsistency where one user device overwrites the state of another device during a concurrent update.
- when handling client disconnects the MSC3757 proposal could not be used as you would not know what the correct
state is at the time of the disconnect.
### One state event per device
This would mean not using `member`.`id` in the state key anymore. Race conditions can be solved by the client which would need to manage multiple sessions at once.
### A separate system not associated with Matrix accounts
This MSC proposes to combine the MatrixRTC backend infrastructure with the homeserver.
Other sources where the backend could be sourced from are:
- A separate system not associated with Matrix accounts.
(you would need a Matrix account + a "LiveKit provider" account for example)
- The client could bring its own backend link.
- A centralized solution.
The centralized solution would not fit to Matrix. A separate system would match the distributed
nature of Matrix but would not match the user experience goals for MatrixRTC calls.
The client defining the SFU that is used, is the current solution. This causes the issue, that clients
in general are less distributed than homeservers. There is only a limited set of clients that a large
percentage of users use.
Using this as the source for the infrastructure would result in just a handful of very large infrastructure
hosts.
This is harder to scale and it is harder to justify who is covering the costs. (For Matrix homeservers, this
is an already solved problem where there are individuals, communities and institutions that have their own individual
solutions and answers for how and why they provide the infrastructure.)
### `m.rtc.encryption_keys` room event
Earlier iterations of this MSC used an encrypted `m.rtc.encryption_keys` room event to distribute the per-participant sender keys.
Whilst reducing traffic by only needing to send one event per participant, this approach does not allow for perfect forward secrecy
as the keys are stored in the room history.
The encrypted content of the `m.rtc.encryption_keys` event was as follows:
```json
{
"session": {
"application": "m.call",
"call_id": ""
},
"member": {
"id": "xyzABCDEF10123",
"device_id": "DEVICEID",
"user_id": "@user:matrix.domain"
}.
"keys": [
{
"index": 0,
"key": "base64encodedkey"
},
],
}
```
## Security considerations
### Discoverability of infrastructure
The `.well-known/matrix/client` is publicly readable, hence everyone can read and know
about the infrastructure which could lead to resource "stealing".
Each infrastructure however has their own authentication mechanism defined in the infrastructure specification.
Those mechanisms for instance can use a service to interact with the homeserver and based on that decide to allow users
to use the infrastructure.
This is defined in the respective infrastructure MSC.
### Forward secrecy for end-to-end encryption of media streams
The considerations to ensure forward secrecy are described in the [End-to-end encryption of media streams](#end-to-end-encryption-of-media-streams)
section above.
### End-to-end media encryption key rotation lag
The proposed key rotation semantics does mean that a participant could continue to decrypt media that was sent in the three seconds after
leaving the session.
## Unstable prefix
The state events and the well_known key introduced in this MSC use the unstable prefix
`org.matrix.msc4143.` instead of `m.` as used in the text.
Use `org.matrix.msc3401.call.member` as the state event type in place of `m.rtc.member`.
For discovery via `.well-known/matrix/client` the prefix `org.matrix.msc4158.rtc_foci` is used in place of `m.rtc_foci`.
Use `io.element.call.encryption_keys` in place of the `m.rtc.encryption_keys` room event and to-device event types.
## Dependencies
This proposal depends on
[MSC3757: Restricting who can overwrite a state event](https://github.com/matrix-org/matrix-spec-proposals/pull/3757)
to provide access control for the decentralised management of call membership state. However, an alternative such
as [MSC3779: "Owned" State Events](https://github.com/matrix-org/matrix-spec-proposals/pull/3779) could be used instead with
some adaptations.
Possible values inside the `m.rtc.member` event (like `m.call`) will use a prefix defined in the
related PR (TODO create and link `m.call` application type PR)
This proposal also depends on [MSC4140: Cancellable delayed events](https://github.com/matrix-org/matrix-spec-proposals/pull/4140)
to provide a mechanism for clients to ensure that they can update the room state even if they lose connection.

Loading…
Cancel
Save