diff --git a/changelogs/client_server/newsfragments/1511.feature b/changelogs/client_server/newsfragments/1511.feature new file mode 100644 index 00000000..d3526b8e --- /dev/null +++ b/changelogs/client_server/newsfragments/1511.feature @@ -0,0 +1 @@ +Update VoIP spec for [MSC2746](https://github.com/matrix-org/matrix-spec-proposals/pull/2746). diff --git a/content/client-server-api/modules/voip_events.md b/content/client-server-api/modules/voip_events.md index 90471cae..75cc7757 100644 --- a/content/client-server-api/modules/voip_events.md +++ b/content/client-server-api/modules/voip_events.md @@ -6,11 +6,128 @@ This module outlines how two users in a room can set up a Voice over IP WebRTC 1.0 standard. Call signalling is achieved by sending [message events](#events) to the room. In this version of the spec, only two-party communication is supported (e.g. between two peers, or between a peer -and a multi-point conferencing unit). This means that clients MUST only -send call events to rooms with exactly two participants. +and a multi-point conferencing unit). Calls can take place in rooms with +multiple members, but only two devices can take part in the call. + +All VoIP events have a `version` field. This is used to determine whether +devices support this new version of the protocol. For example, clients can use +this field to know whether to expect an `m.call.select_answer` event from their +opponent. If clients see events with `version` other than `0` or `"1"` +(including, for example, the numeric value `1`), they should treat these the +same as if they had `version` == `"1"`. + +Note that this implies any and all future versions of VoIP events should be +backwards-compatible. If it does become necessary to introduce a non +backwards-compatible VoIP spec, the intention would be for it to simply use a +separate set of event types. + +#### Party Identifiers +Whenever a client first participates in a new call, it generates a `party_id` for itself to use for the +duration of the call. This needs to be long enough that the chance of a collision between multiple devices +both generating an answer at the same time generating the same party ID is vanishingly small: 8 uppercase + +lowercase alphanumeric characters is recommended. Parties in the call are identified by the tuple of +`(user_id, party_id)`. + +The client adds a `party_id` field containing this ID to the top-level of the content of all VoIP events +it sends on the call, including `m.call.invite`. Clients use this to identify remote echo of their own +events: since a user may call themselves, they cannot simply ignore events from their own user. This +field also identifies different answers sent by different clients to an invite, and matches `m.call.candidates` +events to their respective answer/invite. + +A client implementation may choose to use the device ID used in end-to-end cryptography for this purpose, +or it may choose, for example, to use a different one for each call to avoid leaking information on which +devices were used in a call (in an unencrypted room) or if a single device (ie. access token) were used to +send signalling for more than one call party. + +A grammar for `party_id` is defined [below](#grammar-for-voip-ids). + +#### Politeness +In line with [WebRTC perfect negotiation](https://w3c.github.io/webrtc-pc/#perfect-negotiation-example) +there are rules to establish which party is polite in the process of renegotiation. The callee is +always the polite party. In a glare situation, the politenes of a party is therefore determined by +whether the inbound or outbound call is used: if a client discards its outbound call in favour of +an inbound call, it becomes the polite party. + +#### Call Event Liveness +`m.call.invite` contains a `lifetime` field that indicates how long the offer is valid for. When +a client receives an invite, it should use the event's `age` field in the sync response plus the +time since it received the event from the homeserver to determine whether the invite is still valid. +The use of the `age` field ensures that incorrect clocks on client devices don't break calls. + +If the invite is still valid *and will remain valid for long enough for the user to accept the call*, +it should signal an incoming call. The amount of time allowed for the user to accept the call may +vary between clients. For example, it may be longer on a locked mobile device than on an unlocked +desktop device. + +The client should only signal an incoming call in a given room once it has completed processing the +entire sync response and, for encrypted rooms, attempted to decrypt all encrypted events in the +sync response for that room. This ensures that if the sync response contains subsequent events that +indicate the call has been hung up, rejected, or answered elsewhere, the client does not signal it. + +If on startup, after processing locally stored events, the client determines that there is an invite +that is still valid, it should still signal it but only after it has completed a sync from the homeserver. + +The minimal recommended lifetime is 90 seconds - this should give the user enough time to actually pick +up the call. + +#### ICE Candidate Batching +Clients should aim to send a small number of candidate events, with guidelines: + * ICE candidates which can be discovered immediately or almost immediately in the invite/answer + event itself (eg. host candidates). If server reflexive or relay candidates can be gathered in + a sufficiently short period of time, these should be sent here too. A delay of around 200ms is + suggested as a starting point. + * The client should then allow some time for further candidates to be gathered in order to batch them, + rather than sending each candidate as it arrives. A starting point of 2 seconds after sending the + invite or 500ms after sending the answer is suggested as a starting point (since a delay is natural + anyway after the invite whilst the client waits for the user to accept it). + +#### End-of-candidates +An ICE candidate whose value is the empty string means that no more ICE candidates will +be sent. Clients must send such a candidate in an `m.call.candidates` message. +The WebRTC spec requires browsers to generate such a candidate, however note that at time of writing, +not all browsers do (Chrome does not, but does generate an `icegatheringstatechange` event). The +client should send any remaining candidates once candidate generation finishes, ignoring timeouts above. +This allows bridges to batch the candidates together when bridging to protocols that don't support +trickle ICE. + +#### DTMF +Matrix clients can send DTMF as specified by WebRTC. The WebRTC standard as of August +2020 does not support receiving DTMF but a Matrix client can receive and interpret the DTMF sent +in the RTP payload. + +#### Grammar for VoIP IDs +`call_id`s and `party_id` are explicitly defined to be between 1 and 255 characters long, consisting +of the characters `[0-9a-zA-Z._~-]`. + +(Note that this matches the grammar of 'opaque IDs' from +[MSC1597](https://github.com/matrix-org/matrix-spec-proposals/blob/rav/proposals/id_grammar/proposals/1597-id-grammar.md#opaque-ids), +and that of the `id` property of the + [`m.login.sso` flow schema](#definition-mloginsso-flow-schema).) + +#### Behaviour on Room Leave +If the client sees the user it is in a call with leave the room, the client should treat this +as a hangup event for any calls that are in progress. No specific requirement is given for the +situation where a client has sent an invite and the invitee leaves the room, but the client may +wish to treat it as a rejection if there are no more users in the room who could answer the call +(eg. the user is now alone or the `invitee` field was set on the invite). + +The same behaviour applies when a client is looking at historic calls. + +#### Supported Codecs +The Matrix spec does not mandate particular audio or video codecs, but instead defers to the +WebRTC spec. A compliant Matrix VoIP client will behave in the same way as a supported 'browser' +in terms of what codecs it supports and what variants thereof. The latest WebRTC specification +applies, so clients should keep up to date with new versions of the WebRTC specification whether +or not there have been any changes to the Matrix spec. #### Events +##### Common Fields + +{{% event-fields event_type="call_event" %}} + +##### Events + {{% event-group group_name="m.call" %}} #### Client behaviour @@ -25,6 +142,7 @@ A call is set up with message events exchanged as follows: [..candidates..] --------> [Answers call] <--------------- m.call.answer + m.call.select_answer -----------> [Call is active and ongoing] <--------------- m.call.hangup ``` @@ -42,6 +160,43 @@ Or a rejected call: Calls are negotiated according to the WebRTC specification. +In response to an incoming invite, a client may do one of several things: + * Attempt to accept the call by sending an `m.call.answer`. + * Actively reject the call everywhere: send an `m.call.reject` as per above, which will stop the call from + ringing on all the user's devices and the caller's client will inform them that the user has + rejected their call. + * Ignore the call: send no events, but stop alerting the user about the call. The user's other + devices will continue to ring, and the caller's device will continue to indicate that the call + is ringing, and will time the call out in the normal way if no other device responds. + +##### Streams + +Clients are expected to send one stream with one track of kind `audio` (creating a +voice call). They can optionally send a second track in the same stream of kind +`video` (creating a video call). + +Clients implementing this specification use the first stream and will ignore +any streamless tracks. Note that in the JavaScript WebRTC API, this means +`addTrack()` must be passed two parameters: a track and a stream, not just a +track, and in a video call the stream must be the same for both audio and video +track. + +A client may send other streams and tracks but the behaviour of the other party +with respect to presenting such streams and tracks is undefined. + +##### Invitees +The `invitee` field should be added whenever the call is intended for one +specific user, and should be set to the Matrix user ID of that user. Invites +without an `invitee` field are defined to be intended for any member of the +room other than the sender of the event. + +Clients should consider an incoming call if they see a non-expired invite event where the `invitee` field is either +absent or equal to their user's Matrix ID, however they should evaluate whether or not to ring based on their +user's trust relationship with the callers and/or where the call was placed. As a starting point, it is +suggested that clients ignore call invites from users in public rooms. It is strongly recommended that +when clients do not ring for an incoming call invite, they still display the call invite in the room and +annotate that it was ignored. + ##### Glare "Glare" is a problem which occurs when two users call each other at diff --git a/data/event-schemas/examples/m.call.answer.yaml b/data/event-schemas/examples/m.call.answer.yaml index aaa4da71..78b48878 100644 --- a/data/event-schemas/examples/m.call.answer.yaml +++ b/data/event-schemas/examples/m.call.answer.yaml @@ -2,7 +2,8 @@ "$ref": "core/room_event.json", "type": "m.call.answer", "content": { - "version" : 0, + "version" : "1", + "party_id": "67890", "call_id": "12345", "answer": { "type" : "answer", diff --git a/data/event-schemas/examples/m.call.candidates.yaml b/data/event-schemas/examples/m.call.candidates.yaml index 8f1f807a..23d0a178 100644 --- a/data/event-schemas/examples/m.call.candidates.yaml +++ b/data/event-schemas/examples/m.call.candidates.yaml @@ -2,7 +2,8 @@ "$ref": "core/room_event.json", "type": "m.call.candidates", "content": { - "version" : 0, + "version" : "1", + "party_id": "67890", "call_id": "12345", "candidates": [ { diff --git a/data/event-schemas/examples/m.call.hangup.yaml b/data/event-schemas/examples/m.call.hangup.yaml index 295f16e4..a505bc36 100644 --- a/data/event-schemas/examples/m.call.hangup.yaml +++ b/data/event-schemas/examples/m.call.hangup.yaml @@ -2,7 +2,9 @@ "$ref": "core/room_event.json", "type": "m.call.hangup", "content": { - "version" : 0, - "call_id": "12345" + "version" : "1", + "party_id": "67890", + "call_id": "12345", + "reason": "user_hangup" } } diff --git a/data/event-schemas/examples/m.call.invite.yaml b/data/event-schemas/examples/m.call.invite.yaml index fa482bd9..45600001 100644 --- a/data/event-schemas/examples/m.call.invite.yaml +++ b/data/event-schemas/examples/m.call.invite.yaml @@ -2,7 +2,8 @@ "$ref": "core/room_event.json", "type": "m.call.invite", "content": { - "version" : 0, + "version" : "1", + "party_id": "67890", "call_id": "12345", "lifetime": 60000, "offer": { diff --git a/data/event-schemas/examples/m.call.negotiate.yaml b/data/event-schemas/examples/m.call.negotiate.yaml new file mode 100644 index 00000000..f4ad8587 --- /dev/null +++ b/data/event-schemas/examples/m.call.negotiate.yaml @@ -0,0 +1,14 @@ +{ + "$ref": "core/room_event.json", + "type": "m.call.negotiate", + "content": { + "version" : "1", + "party_id": "67890", + "call_id": "12345", + "lifetime": 10000, + "offer": { + "type" : "offer", + "sdp" : "v=0\r\no=- 6584580628695956864 2 IN IP4 127.0.0.1[...]" + } + } +} diff --git a/data/event-schemas/examples/m.call.reject.yaml b/data/event-schemas/examples/m.call.reject.yaml new file mode 100644 index 00000000..2014566c --- /dev/null +++ b/data/event-schemas/examples/m.call.reject.yaml @@ -0,0 +1,9 @@ +{ + "$ref": "core/room_event.json", + "type": "m.call.reject", + "content": { + "version" : "1", + "party_id": "67890", + "call_id": "12345" + } +} diff --git a/data/event-schemas/examples/m.call.select_answer.yaml b/data/event-schemas/examples/m.call.select_answer.yaml new file mode 100644 index 00000000..fbd6ad16 --- /dev/null +++ b/data/event-schemas/examples/m.call.select_answer.yaml @@ -0,0 +1,10 @@ +{ + "$ref": "core/room_event.json", + "type": "m.call.select_answer", + "content": { + "version" : "1", + "call_id": "12345", + "party_id": "67890", + "selected_party_id": "111213" + } +} diff --git a/data/event-schemas/schema/core-event-schema/call_event.yaml b/data/event-schemas/schema/core-event-schema/call_event.yaml new file mode 100644 index 00000000..a8175fc8 --- /dev/null +++ b/data/event-schemas/schema/core-event-schema/call_event.yaml @@ -0,0 +1,25 @@ +description: "The content of all call events shares a set of common fields: those + of room events and some additional VoIP specific fields." +properties: + call_id: + type: string + description: The ID of the call this event relates to. + version: + type: string + description: The version of the VoIP specification this message adheres to. + This specification is version 1. This field is a string such that experimental + implementations can use non-integer versions. This field was an integer + in the previous spec version and implementations must accept an integer + 0. + party_id: + type: string + description: 'This identifies the party that sent this event. A client may + choose to re-use the device ID from end-to-end cryptography for the value + of this field.' + x-addedInMatrixVersion: "1.7" +required: +- call_id +- version +- party_id +title: CallEvent +type: object diff --git a/data/event-schemas/schema/m.call.answer.yaml b/data/event-schemas/schema/m.call.answer.yaml index e84cf6f8..163690be 100644 --- a/data/event-schemas/schema/m.call.answer.yaml +++ b/data/event-schemas/schema/m.call.answer.yaml @@ -7,11 +7,10 @@ "properties": { "content": { "type": "object", + "allOf": [{ + "$ref": "core-event-schema/call_event.yaml" + }], "properties": { - "call_id": { - "type": "string", - "description": "The ID of the call this event relates to." - }, "answer": { "type": "object", "title": "Answer", @@ -28,13 +27,9 @@ } }, "required": ["type", "sdp"] - }, - "version": { - "type": "number", - "description": "The version of the VoIP specification this messages adheres to. This specification is version 0." } }, - "required": ["call_id", "answer", "version"] + "required": ["answer"] }, "type": { "type": "string", diff --git a/data/event-schemas/schema/m.call.candidates.yaml b/data/event-schemas/schema/m.call.candidates.yaml index 7426717c..6aa16229 100644 --- a/data/event-schemas/schema/m.call.candidates.yaml +++ b/data/event-schemas/schema/m.call.candidates.yaml @@ -7,11 +7,10 @@ "properties": { "content": { "type": "object", + "allOf": [{ + "$ref": "core-event-schema/call_event.yaml" + }], "properties": { - "call_id": { - "type": "string", - "description": "The ID of the call this event relates to." - }, "candidates": { "type": "array", "description": "Array of objects describing the candidates.", @@ -34,13 +33,9 @@ }, "required": ["candidate", "sdpMLineIndex", "sdpMid"] } - }, - "version": { - "type": "integer", - "description": "The version of the VoIP specification this messages adheres to. This specification is version 0." } }, - "required": ["call_id", "candidates", "version"] + "required": ["candidates"] }, "type": { "type": "string", diff --git a/data/event-schemas/schema/m.call.hangup.yaml b/data/event-schemas/schema/m.call.hangup.yaml index 116d5af7..65d697ab 100644 --- a/data/event-schemas/schema/m.call.hangup.yaml +++ b/data/event-schemas/schema/m.call.hangup.yaml @@ -1,35 +1,54 @@ -{ - "type": "object", - "description": "Sent by either party to signal their termination of the call. This can be sent either once the call has has been established or before to abort the call.", - "allOf": [{ - "$ref": "core-event-schema/room_event.yaml" - }], - "properties": { - "content": { - "type": "object", - "properties": { - "call_id": { - "type": "string", - "description": "The ID of the call this event relates to." - }, - "version": { - "type": "integer", - "description": "The version of the VoIP specification this message adheres to. This specification is version 0." - }, - "reason": { - "type": "string", - "description": "Optional error reason for the hangup. This should not be provided when the user naturally ends or rejects the call. When there was an error in the call negotiation, this should be `ice_failed` for when ICE negotiation fails or `invite_timeout` for when the other party did not answer in time.", - "enum": [ - "ice_failed", - "invite_timeout" - ] - } - }, - "required": ["call_id", "version"] - }, - "type": { - "type": "string", - "enum": ["m.call.hangup"] - } - } -} +--- +type: object +description: | + Sent by either party to signal their termination of the call. This can + be sent either once the call has has been established or before to abort the call. + + The meanings of the `reason` field are as follows: + * `ice_failed`: ICE negotiation has failed and a media connection could not be established. + * `ice_timeout`: The connection failed after some media was exchanged (as opposed to `ice_failed` + which means no media connection could be established). Note that, in the case of an ICE + renegotiation, a client should be sure to send `ice_timeout` rather than `ice_failed` if media + had previously been received successfully, even if the ICE renegotiation itself failed. + * `invite_timeout`: The other party did not answer in time. + * `user_hangup`: Clients must now send this code when the user chooses to end the call, although + for backwards compatibility with version 0, a clients should treat an absence of the `reason` + field as `user_hangup`. + * `user_media_failed`: The client was unable to start capturing media in such a way that it is unable + to continue the call. + * `user_busy`: The user is busy. Note that this exists primarily for bridging to other networks such + as the PSTN. A Matrix client that receives a call whilst already in a call would not generally reject + the new call unless the user had specifically chosen to do so. + * `unknown_error`: Some other failure occurred that meant the client was unable to continue the call + rather than the user choosing to end it. +allOf: +- "$ref": core-event-schema/room_event.yaml +properties: + content: + type: object + allOf: + - "$ref": core-event-schema/call_event.yaml + properties: + reason: + type: string + description: Reason for the hangup. Note that this was optional in + previous previous versions of the spec, so a missing value should be + treated as `user_hangup`. + x-changedInMatrixVersion: + 1.7: |- + Additional values were added. + enum: + - ice_timeout + - ice_failed + - invite_timeout + - user_hangup + - user_media_failed + - user_busy + - unknown_error + required: + - reason + type: + type: string + enum: + - m.call.hangup + diff --git a/data/event-schemas/schema/m.call.invite.yaml b/data/event-schemas/schema/m.call.invite.yaml index 65796e1e..72020b26 100644 --- a/data/event-schemas/schema/m.call.invite.yaml +++ b/data/event-schemas/schema/m.call.invite.yaml @@ -7,11 +7,10 @@ "properties": { "content": { "type": "object", + "allOf": [{ + "$ref": "core-event-schema/call_event.yaml" + }], "properties": { - "call_id": { - "type": "string", - "description": "A unique identifier for the call." - }, "offer": { "type": "object", "title": "Offer", @@ -29,16 +28,17 @@ }, "required": ["type", "sdp"] }, - "version": { - "type": "integer", - "description": "The version of the VoIP specification this message adheres to. This specification is version 0." - }, "lifetime": { "type": "integer", "description": "The time in milliseconds that the invite is valid for. Once the invite age exceeds this value, clients should discard it. They should also no longer show the call as awaiting an answer in the UI." + }, + "invitee": { + "type": "string", + "description": "The ID of the user being called. If omitted, any user in the room can answer.", + "x-addedInMatrixVersion": "1.7", } }, - "required": ["call_id", "offer", "version", "lifetime"] + "required": ["offer", "lifetime"] }, "type": { "type": "string", diff --git a/data/event-schemas/schema/m.call.negotiate.yaml b/data/event-schemas/schema/m.call.negotiate.yaml new file mode 100644 index 00000000..abc5ef1d --- /dev/null +++ b/data/event-schemas/schema/m.call.negotiate.yaml @@ -0,0 +1,74 @@ +--- +type: object +description: | + Provides SDP negotiation semantics for media pause, hold/resume, ICE restarts + and voice/video call up/downgrading. Clients should implement and honour hold + functionality as per [WebRTC's recommendation](https://www.w3.org/TR/webrtc/#hold-functionality). + + If both the invite event and the accepted answer event have `version` equal + to `"1"`, either party may send `m.call.negotiate` with a `description` field + to offer new SDP to the other party. This event has `call_id` with the ID of + the call and `party_id` equal to the client's party ID for that call. The + caller ignores any negotiate events with `party_id` + `user_id` tuple not + equal to that of the answer it accepted and the callee ignores any negotiate + events with `party_id` + `user_id` tuple not equal to that of the caller. + Clients should use the `party_id` field to ignore the remote echo of their + own negotiate events. + + This has a `lifetime` field as in `m.call.invite`, after which the sender of + the negotiate event should consider the negotiation failed (timed out) and + the recipient should ignore it. + + The `description` field is the same as the `offer` field in `m.call.invite` + and `answer` field in `m.call.answer` and is an `RTCSessionDescriptionInit` + object as per https://www.w3.org/TR/webrtc/#dom-rtcsessiondescriptioninit. + + Once an `m.call.negotiate` event is received, the client must respond with + another `m.call.negotiate` event, with the SDP answer (with `"type": "answer"`) + in the `description` property. + + In the `m.call.invite` and `m.call.answer` events, the `offer` and `answer` + fields respectively are objects of type `RTCSessionDescriptionInit`. Hence + the `type` field, whilst redundant in these events, is included for ease of + working with the WebRTC API and is mandatory. Receiving clients should not + attempt to validate the `type` field, but simply pass the object into the + WebRTC API. +x-addedInMatrixVersion: "1.7" +allOf: +- "$ref": core-event-schema/room_event.yaml +properties: + content: + type: object + allOf: + - "$ref": core-event-schema/call_event.yaml + properties: + offer: + type: object + title: Offer + description: The session description object + properties: + type: + type: string + enum: + - offer + description: The type of session description. + sdp: + type: string + description: The SDP text of the session description. + required: + - type + - sdp + lifetime: + type: integer + description: The time in milliseconds that the invite is valid for. + Once the invite age exceeds this value, clients should discard it. + They should also no longer show the call as awaiting an answer in the + UI. + required: + - offer + - lifetime + type: + type: string + enum: + - m.call.negotiate + diff --git a/data/event-schemas/schema/m.call.reject.yaml b/data/event-schemas/schema/m.call.reject.yaml new file mode 100644 index 00000000..39726c1a --- /dev/null +++ b/data/event-schemas/schema/m.call.reject.yaml @@ -0,0 +1,28 @@ +--- +type: object +description: | + If the `m.call.invite` event has `version` `"1"`, a client wishing to + reject the call sends an `m.call.reject` event. This rejects the call on all devices, + but if the calling device sees an `answer` before the `reject`, it disregards the + reject event and carries on. The reject has a `party_id` just like an answer, and + the caller sends a `select_answer` for it just like an answer. If another client + had already sent an answer and sees the caller select the reject response instead + of its answer, it ends the call. If the `m.call.invite` event has `version` `0`, + the callee sends an `m.call.hangup` event. If the calling user chooses to end the + call before setup is complete, the client sends `m.call.hangup` as previously. + + Note that, unlike `m.call.hangup`, this event has no `reason` field: the rejection of + a call is always implicitly because the user chose not to answer it. +x-addedInMatrixVersion: "1.7" +allOf: +- "$ref": core-event-schema/room_event.yaml +properties: + content: + type: object + allOf: + - "$ref": core-event-schema/call_event.yaml + type: + type: string + enum: + - m.call.reject + diff --git a/data/event-schemas/schema/m.call.select_answer.yaml b/data/event-schemas/schema/m.call.select_answer.yaml new file mode 100644 index 00000000..b47c1352 --- /dev/null +++ b/data/event-schemas/schema/m.call.select_answer.yaml @@ -0,0 +1,27 @@ +{ + "type": "object", + "description": "This event is sent by the caller's client once it has decided which other client to talk to, by selecting one of multiple possible incoming `m.call.answer` events. Its `selected_party_id` field indicates the answer it's chosen. The `call_id` and `party_id` of the caller is also included. If the callee's client sees a `select_answer` for an answer with party ID other than the one it sent, it ends the call and informs the user the call was answered elsewhere. It does not send any events. Media can start flowing before this event is seen or even sent. Clients that implement previous versions of this specification will ignore this event and behave as they did before.", + "x-addedInMatrixVersion": "1.7", + "allOf": [{ + "$ref": "core-event-schema/room_event.yaml" + }], + "properties": { + "content": { + "type": "object", + "allOf": [{ + "$ref": "core-event-schema/call_event.yaml" + }], + "properties": { + "selected_party_id": { + "type": "string", + "description": "The `party_id` field from the answer event that the caller chose." + }, + }, + "required": ["selected_party_id"] + }, + "type": { + "type": "string", + "enum": ["m.call.select_answer"] + } + } +}