From 278fd42f957f599d3a51030dacf0d3892a2072a5 Mon Sep 17 00:00:00 2001 From: Michael Weimann Date: Fri, 9 Sep 2022 12:49:24 +0200 Subject: [PATCH 1/8] Add MSC3888: Voice Broadcast --- proposals/3888-voice-broadcast.md | 119 ++++++++++++++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 proposals/3888-voice-broadcast.md diff --git a/proposals/3888-voice-broadcast.md b/proposals/3888-voice-broadcast.md new file mode 100644 index 00000000..9cb3375a --- /dev/null +++ b/proposals/3888-voice-broadcast.md @@ -0,0 +1,119 @@ +# MSC3888: Voice Broadcast + +As a user I want to be able to send a voice broadcast to all users of a room, +so that I can easily provide information by just talking. +Compared to already existing voice messages it should be possible to listen to +voice broadcast at the time they are being recorded. +Think of this being a live-podcast over Matrix. + +Some use-case scenarios for this feature can be to verbally provide updates, +trainings, or talk to a large number of people. + + +## Proposal + +This MSC proposes using chunks of voice messages to implement the Voice +Broadcast feature. It will built on top of [MSC3245: Voice Messages][MSC3245] +and is heavily inspired by [MSC3489: Live Location Sharing][MSC3489]. + +A new state event `m.voice_broadcast_info` will be introduced. This state event +identifies a broadcast and provides its state, such as "started" or "paused". +In addition to that `m.voice` messages will receive a relation to the state +event to mark them as voice broadcast chunks. + +`m.voice_broadcast_info` event example: + +```json +{ + "type": "m.voice_broadcast_info, + "state_key": "@matthew:matrix.org", + "content": { + "state": "started", + } +} +``` + +- `type` is `m.voice_broadcast_info`. +- `state_key` contains the broadcaster's MXID. +- `state` describes the broadcast state as listed: + - `running` flags a voice broadcast as currently being live. + - `paused` stands for a paused broadcast that may be resumed. + - `stopped` marks a broadcast as finished. + +`m.voice` messages that belong to the broadcast will have a [MSC3267][MSC3267] +relation to the identifying state event. + +Example of a voice message as part of a broadcast: + +```json +{ + "type": "m.voice", + "content": { + "m.relates_to": { + "rel_type": "m.reference", + "event_id": "$voice_broadcast_info_event_id", + } + } +} +``` + + +## Potential Issues + +### Not Actually Being Live + +Compared to a streaming solution the chunked voice message broadcast is not +actually live. Instead there will always be an offset of the chosen length of +the single voice chunks. For example if someone asks for a live response in chat +during his broadcast, he won't receive an immediate response. + + +### Client Fallback Behaviour + +Depending on the chosen length of each broadcast chunk clients not supporting +this MSC will receive a number of message. Example for an 1 hour broadcast and +a five minute chunk length: 60 / 5 = 20 messages. This can be quite annoying. + +TODO: Can we disable notifications for N > 1 chunk messages? + + +## Alternatives + +### Video/Voice rooms with recording + +Server-side recording would be required for reliable recording. It would be +quite challenging to do this with maintaining end-to-end-encryption. + + +#### Element Call + +In order for voice broadcasts to support a large number of listeners, +it would rely on SFU (selective forwarding unit), which is not yet ready. + + +### Streaming File Transfer + + +## Unstable Prefix + +Until this MSC lands, the following unstable prefixes should be used: + +`m.voice_broadcast_info` → `org.matrix.msc3888.voice_broadcast_info` +`state` → `org.matrix.msc3888.state` + +Example of the state event with unstable prefix: + +```json +{ + "type": "m.voice_broadcast_info, + "state_key": "@matthew:matrix.org", + "content": { + "org.matrix.msc3888.state": "started", + } +} +``` + + +[MSC3245]: https://github.com/matrix-org/matrix-spec-proposals/blob/travis/msc/voice-messages/proposals/3245-voice-messages.md +[MSC3489]: https://github.com/matrix-org/matrix-spec-proposals/blob/matthew/location-streaming/proposals/3489-location-streaming.md +[MSC3267]: https://github.com/matrix-org/matrix-spec-proposals/blob/aggregations-references/proposals/3267-reference-relations.md From 49c5aa919eff50c598cf26842b647f3cf847dc70 Mon Sep 17 00:00:00 2001 From: Michael Weimann Date: Mon, 12 Sep 2022 16:05:18 +0200 Subject: [PATCH 2/8] Update proposals/3888-voice-broadcast.md Co-authored-by: Kim Brose <2803622+HarHarLinks@users.noreply.github.com> --- proposals/3888-voice-broadcast.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3888-voice-broadcast.md b/proposals/3888-voice-broadcast.md index 9cb3375a..733c78aa 100644 --- a/proposals/3888-voice-broadcast.md +++ b/proposals/3888-voice-broadcast.md @@ -25,7 +25,7 @@ event to mark them as voice broadcast chunks. ```json { - "type": "m.voice_broadcast_info, + "type": "m.voice_broadcast_info", "state_key": "@matthew:matrix.org", "content": { "state": "started", From d29ef9ae333890656eb7a505582a65731c09d4f2 Mon Sep 17 00:00:00 2001 From: Michael Weimann Date: Tue, 13 Sep 2022 09:49:36 +0200 Subject: [PATCH 3/8] Mark MSC as on hold --- proposals/3888-voice-broadcast.md | 109 +----------------------------- 1 file changed, 2 insertions(+), 107 deletions(-) diff --git a/proposals/3888-voice-broadcast.md b/proposals/3888-voice-broadcast.md index 733c78aa..bcc4ab7a 100644 --- a/proposals/3888-voice-broadcast.md +++ b/proposals/3888-voice-broadcast.md @@ -9,111 +9,6 @@ Think of this being a live-podcast over Matrix. Some use-case scenarios for this feature can be to verbally provide updates, trainings, or talk to a large number of people. +## State of this MSC -## Proposal - -This MSC proposes using chunks of voice messages to implement the Voice -Broadcast feature. It will built on top of [MSC3245: Voice Messages][MSC3245] -and is heavily inspired by [MSC3489: Live Location Sharing][MSC3489]. - -A new state event `m.voice_broadcast_info` will be introduced. This state event -identifies a broadcast and provides its state, such as "started" or "paused". -In addition to that `m.voice` messages will receive a relation to the state -event to mark them as voice broadcast chunks. - -`m.voice_broadcast_info` event example: - -```json -{ - "type": "m.voice_broadcast_info", - "state_key": "@matthew:matrix.org", - "content": { - "state": "started", - } -} -``` - -- `type` is `m.voice_broadcast_info`. -- `state_key` contains the broadcaster's MXID. -- `state` describes the broadcast state as listed: - - `running` flags a voice broadcast as currently being live. - - `paused` stands for a paused broadcast that may be resumed. - - `stopped` marks a broadcast as finished. - -`m.voice` messages that belong to the broadcast will have a [MSC3267][MSC3267] -relation to the identifying state event. - -Example of a voice message as part of a broadcast: - -```json -{ - "type": "m.voice", - "content": { - "m.relates_to": { - "rel_type": "m.reference", - "event_id": "$voice_broadcast_info_event_id", - } - } -} -``` - - -## Potential Issues - -### Not Actually Being Live - -Compared to a streaming solution the chunked voice message broadcast is not -actually live. Instead there will always be an offset of the chosen length of -the single voice chunks. For example if someone asks for a live response in chat -during his broadcast, he won't receive an immediate response. - - -### Client Fallback Behaviour - -Depending on the chosen length of each broadcast chunk clients not supporting -this MSC will receive a number of message. Example for an 1 hour broadcast and -a five minute chunk length: 60 / 5 = 20 messages. This can be quite annoying. - -TODO: Can we disable notifications for N > 1 chunk messages? - - -## Alternatives - -### Video/Voice rooms with recording - -Server-side recording would be required for reliable recording. It would be -quite challenging to do this with maintaining end-to-end-encryption. - - -#### Element Call - -In order for voice broadcasts to support a large number of listeners, -it would rely on SFU (selective forwarding unit), which is not yet ready. - - -### Streaming File Transfer - - -## Unstable Prefix - -Until this MSC lands, the following unstable prefixes should be used: - -`m.voice_broadcast_info` → `org.matrix.msc3888.voice_broadcast_info` -`state` → `org.matrix.msc3888.state` - -Example of the state event with unstable prefix: - -```json -{ - "type": "m.voice_broadcast_info, - "state_key": "@matthew:matrix.org", - "content": { - "org.matrix.msc3888.state": "started", - } -} -``` - - -[MSC3245]: https://github.com/matrix-org/matrix-spec-proposals/blob/travis/msc/voice-messages/proposals/3245-voice-messages.md -[MSC3489]: https://github.com/matrix-org/matrix-spec-proposals/blob/matthew/location-streaming/proposals/3489-location-streaming.md -[MSC3267]: https://github.com/matrix-org/matrix-spec-proposals/blob/aggregations-references/proposals/3267-reference-relations.md +This MSC is on hold until Element Call with SFU is ready. From 65dc824dc930330d641412b69519cd1d9fd1260e Mon Sep 17 00:00:00 2001 From: Michael Weimann Date: Tue, 13 Sep 2022 18:02:35 +0200 Subject: [PATCH 4/8] Update ToDo state --- proposals/3888-voice-broadcast.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/proposals/3888-voice-broadcast.md b/proposals/3888-voice-broadcast.md index bcc4ab7a..a80f7da7 100644 --- a/proposals/3888-voice-broadcast.md +++ b/proposals/3888-voice-broadcast.md @@ -9,6 +9,11 @@ Think of this being a live-podcast over Matrix. Some use-case scenarios for this feature can be to verbally provide updates, trainings, or talk to a large number of people. -## State of this MSC +## Proposal -This MSC is on hold until Element Call with SFU is ready. +We'd use [MSC3401: Native Group VoIP Signalling][MSC3401] in combination with +"voice/video rooms" to provide a broadcast feature. + +TODO: Describe how that works + +[MSC3401]: https://github.com/matrix-org/matrix-spec-proposals/pull/3401 From b94ae5fe0d8c0f3de094b955a74b0f9a16a5554b Mon Sep 17 00:00:00 2001 From: Michael Weimann Date: Wed, 24 May 2023 12:06:30 +0200 Subject: [PATCH 5/8] Copy spec from discussion --- proposals/3888-voice-broadcast.md | 203 ++++++++++++++++++++++++++++-- 1 file changed, 192 insertions(+), 11 deletions(-) diff --git a/proposals/3888-voice-broadcast.md b/proposals/3888-voice-broadcast.md index a80f7da7..39139200 100644 --- a/proposals/3888-voice-broadcast.md +++ b/proposals/3888-voice-broadcast.md @@ -1,19 +1,200 @@ # MSC3888: Voice Broadcast -As a user I want to be able to send a voice broadcast to all users of a room, -so that I can easily provide information by just talking. -Compared to already existing voice messages it should be possible to listen to -voice broadcast at the time they are being recorded. -Think of this being a live-podcast over Matrix. +**Is your feature request related to a problem? Please describe.** -Some use-case scenarios for this feature can be to verbally provide updates, -trainings, or talk to a large number of people. +As a user I want to be able to send a voice broadcast to all users of a room, so that I can easily provide information by just talking. Compared to already existing voice messages it should be possible to listen to voice broadcast at the time they are being recorded. Think of this being a live-podcast over Matrix. -## Proposal +Some use-case scenarios for this feature can be to verbally provide updates, trainings, or talk to a large number of people. -We'd use [MSC3401: Native Group VoIP Signalling][MSC3401] in combination with -"voice/video rooms" to provide a broadcast feature. +**Describe the solution you'd like** -TODO: Describe how that works +This issue proposes using a series of voice messages to implement the Voice Broadcast feature. +A new state event `m.voice_broadcast_info` will be introduced. This state event identifies a broadcast and provides its state, such as `started` or `paused`. In addition to that this proposal introduces new event type `m.voice_broadcast_chunk` with a relation to the state event to mark them as belonging to a specific voice broadcast. + +The following diagram shows a typical event flow for a voice broadcast: + +```mermaid +stateDiagram-v2 + started: m.voice_broadcast_info started + paused: m.voice_broadcast_info paused + resumed: m.voice_broadcast_info resumed + stopped: m.voice_broadcast_info stopped + chunks1: m.broadcast_chunk … + chunks2: m.broadcast_chunk … + [*] --> started + started --> chunks1 + chunks1 --> paused + paused --> resumed + resumed --> chunks2 + chunks2 --> stopped + stopped --> [*] +``` + +`m.voice_broadcast_info` event example: + +```json +{ + "type": "m.voice_broadcast_info", + "state_key": "@martin:example.com", + "content": { + "device_id": "ABCDEFG", + "state": "started" + }, + "…": "other state event fields" +} +``` + +`m.voice_broadcast_info` properties: + +| Name | Required | Description | +| --- | --- | --- | +| `type` | yes | Must be `m.voice_broadcast_info` | +| `state_key` | yes | Must contain the broadcaster's MXID | +| `state` | yes | Must contain the broadcast state:
  • `started` a new voice broadcast has been started and is currently being live
  • `paused` stands for a paused broadcast that may be resumed
  • `resumed` flags a voice broadcast previously paused as resumed
  • `stopped` marks a broadcast as finished
| +| `device_id` | Only if `state` is `started` | Must contain the ID of device from which the broadcast has been started | +| `m.relates_to` | Only if `state` is not `started` | [MSC3267][MSC3267] `m.reference` relation to the `started` `m.voice_broadcast_info` state event | +| `last_chunk_sequence` | Only if `state` is `paused` or `stopped` | Must be the sequence of the last sent chunk | + +Every following `m.voice_broadcast_info` should have an [MSC3267][MSC3267] `m.reference` relation to the `started` state event: + +```json +{ + "type": "m.voice_broadcast_info", + "state_key": "@martin:example.com", + "content": { + "device_id": "ABCDEFG", + "state": "paused", + "last_chunk_sequence": 5, + "m.relates_to": { + "rel_type": "m.reference", + "event_id": "$voice_broadcast_info_started_event_id" + } + }, + "…": "other state event fields" +} +``` + +This diagram shows the possible `state` transitions: + +```mermaid +stateDiagram-v2 + [*] --> started + started --> paused + paused --> resumed + resumed --> paused + resumed --> stopped + paused --> stopped + started --> stopped + stopped --> [*] +``` + +:information_source: This proposal does not suggest to implement any server-side support to enforce the correct event order. Clients should be robust enough to handle any `state` transition and receive broadcast chunks in any state. + +The message type of a voice broadcast chunk is `m.voice_broadcast_chunk`. The message content uses the `info`, `file` and `url` properties from [`m.audio` `m.room.message` events][m_audio]. + +`m.voice_broadcast_chunk` messages that belong to the broadcast will have an [MSC3267][MSC3267] `m.reference` +relation to the `started` `m.voice_broadcast_info` state event. + +To inform clients without voice broadcast support that a broadcast will take place, the first chunk message should contain an [MSC1767][MSC1767] extensible event `m.text` block. + +Example of a first broadcast chunk: + +```json +{ + "type": "m.voice_broadcast_chunk", + "content": { + "m.text": [ + { + "body": "Alice has started a voice broadcast. Unfortunately, your client does not seem to support playback of it." + } + ], + "m.relates_to": { + "rel_type": "m.reference", + "event_id": "$voice_broadcast_info_started_event_id", + }, + "info": { + "duration": 2140786, + "mimetype": "audio/mpeg", + "size": 1563685 + }, + "url": "mxc://example.org/ffed755USFFxlgbQYZGtryd", + "sequence": 1 + } +} +``` + +Example of a subsequent broadcast chunk: + +```json +{ + "type": "m.voice_broadcast_chunk", + "content": { + "m.relates_to": { + "rel_type": "m.reference", + "event_id": "$voice_broadcast_info_started_event_id", + }, + "info": { + "duration": 2140786, + "mimetype": "audio/mpeg", + "size": 1563685 + }, + "url": "mxc://example.org/ffed755USFFxlgbQYZGtryd", + "sequence": 23 + } +} +``` + +`m.voice_broadcast_chunk` properties: + +| Name | Required | Description | +| --- | --- | --- | +| `m.relates_to` | yes | [MSC3267][MSC3267] `m.reference` relation to the `started` `m.voice_broadcast_info` state event | +| `info` | yes | [See `m.audio` ][m_audio] | +| `url`/`file` | yes | [See `m.audio` ][m_audio] | +| `sequence` | yes | The sequence number to determine the correct order of the chunks starting at `1` | + +--- + +**Describe alternatives you've considered** + +_Implementation based on [MSC3401: Native Group VoIP Signalling][MSC3401]_ + +- VoIP can’t do mix down while preserving E2EE without a high effort +- It would rely on a media-server SFU which doesn’t exist at the time writing this MSC + +_Implementation based on [MSC4016: Streaming E2EE file transfers with random access]_ + +Streaming file transfer could be relatively easy on the server-side, given Glow exists and works (other than the operational unpleasantness of introducing an entirely new server), but client-side it requires major changes on all three platforms: + +- Switch the whole file-upload/download pipeline to be streamed rather than in-memory blobs +- Switch crypto to AES-GCM (and spec it) to solve hashing causality + - Switch the whole voice message UI to calculate waveforms locally as the file is received (rescaling the waveform as more message is received!?), and to allow scrubbing within a file whose length is changing from under you +- Optional: random access to the download, letting the receiver jump to the end of the voice message without having to download the rest of the file first. (a 2 hour voice message is roughly 20MB, so it might not be a disaster to have to download the whole VM first rather than doing random access). This would mean: +- Change crypto to send in self-contained AES-GCM blocks (of 16KB or so), to support random access +- Support scrubbing into chunks of the file which haven’t been downloaded yet + +**Unstable prefixes** + +While this MSC is unstable, these event types should be used: + +| Stable value | Unstable value | +| --- | --- | +| `m.voice_broadcast_info` | `org.matrix.msc3888.voice_broadcast_info` | +| `m.voice_broadcast_chunk` | `org.matrix.msc3888.voice_broadcast_chunk` | + +**Additional context** + +Clients could use [MSC3912: Relation-based redactions][MSC3912] to redact a voice broadcast and all related events. + +:information_source: This discussion describes an intermediate solution to implement voice broadcasts. The final solution will be described in [MSC3888][MSC3888]. + +[m_audio]: https://spec.matrix.org/v1.6/client-server-api/#maudio +[MSC1767]: https://github.com/matrix-org/matrix-spec-proposals/pull/1767 +[MSC3245]: https://github.com/matrix-org/matrix-spec-proposals/blob/travis/msc/voice-messages/proposals/3245-voice-messages.md +[MSC3267]: https://github.com/matrix-org/matrix-spec-proposals/blob/aggregations-references/proposals/3267-reference-relations.md [MSC3401]: https://github.com/matrix-org/matrix-spec-proposals/pull/3401 +[MSC3489]: https://github.com/matrix-org/matrix-spec-proposals/blob/matthew/location-streaming/proposals/3489-location-streaming.md +[MSC3888]: https://github.com/matrix-org/matrix-spec-proposals/pull/3888 +[MSC3912]: https://github.com/matrix-org/matrix-spec-proposals/pull/3912 +[MSC4016]: https://github.com/matrix-org/matrix-spec-proposals/pull/4016 From 2e69c06b81cc313a587023a7a0efebf9b0f9c066 Mon Sep 17 00:00:00 2001 From: Michael Weimann Date: Wed, 24 May 2023 14:11:14 +0200 Subject: [PATCH 6/8] Fix broken link --- proposals/3888-voice-broadcast.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3888-voice-broadcast.md b/proposals/3888-voice-broadcast.md index 39139200..21af1ca3 100644 --- a/proposals/3888-voice-broadcast.md +++ b/proposals/3888-voice-broadcast.md @@ -163,7 +163,7 @@ _Implementation based on [MSC3401: Native Group VoIP Signalling][MSC3401]_ - VoIP can’t do mix down while preserving E2EE without a high effort - It would rely on a media-server SFU which doesn’t exist at the time writing this MSC -_Implementation based on [MSC4016: Streaming E2EE file transfers with random access]_ +_Implementation based on [MSC4016: Streaming E2EE file transfers with random access][MSC4016]_ Streaming file transfer could be relatively easy on the server-side, given Glow exists and works (other than the operational unpleasantness of introducing an entirely new server), but client-side it requires major changes on all three platforms: From 42e715880c2d5fc4fc9c3dc33cc67c9a7f35fc90 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Mon, 5 Jun 2023 15:09:30 +0100 Subject: [PATCH 7/8] Update 3888-voice-broadcast.md tweak MSC4016 references --- proposals/3888-voice-broadcast.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/proposals/3888-voice-broadcast.md b/proposals/3888-voice-broadcast.md index 21af1ca3..3d3d8383 100644 --- a/proposals/3888-voice-broadcast.md +++ b/proposals/3888-voice-broadcast.md @@ -167,9 +167,10 @@ _Implementation based on [MSC4016: Streaming E2EE file transfers with random acc Streaming file transfer could be relatively easy on the server-side, given Glow exists and works (other than the operational unpleasantness of introducing an entirely new server), but client-side it requires major changes on all three platforms: -- Switch the whole file-upload/download pipeline to be streamed rather than in-memory blobs -- Switch crypto to AES-GCM (and spec it) to solve hashing causality - - Switch the whole voice message UI to calculate waveforms locally as the file is received (rescaling the waveform as more message is received!?), and to allow scrubbing within a file whose length is changing from under you +- Switch the whole file-upload/download pipeline to be streamed rather than in-memory blobs (desirable anyway for memory usage of clients, and ability to send > ~1.5GB files) +- Switch crypto to AES-GCM to solve hashing causality +- Switch to use async file uploads (MSC2246)[https://github.com/matrix-org/matrix-spec-proposals/pull/2246] so you can send the event to reference the URL before finishing uploading the media +- Switch the whole voice message UI to calculate waveforms locally as the file is received (rescaling the waveform as more message is received!?), and to allow scrubbing within a file whose length is changing from under you - Optional: random access to the download, letting the receiver jump to the end of the voice message without having to download the rest of the file first. (a 2 hour voice message is roughly 20MB, so it might not be a disaster to have to download the whole VM first rather than doing random access). This would mean: - Change crypto to send in self-contained AES-GCM blocks (of 16KB or so), to support random access - Support scrubbing into chunks of the file which haven’t been downloaded yet From 4930208f5fcf89d039757b65925e04ecccbba7bf Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Mon, 5 Jun 2023 15:09:57 +0100 Subject: [PATCH 8/8] Update 3888-voice-broadcast.md typo --- proposals/3888-voice-broadcast.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3888-voice-broadcast.md b/proposals/3888-voice-broadcast.md index 3d3d8383..2f17d211 100644 --- a/proposals/3888-voice-broadcast.md +++ b/proposals/3888-voice-broadcast.md @@ -169,7 +169,7 @@ Streaming file transfer could be relatively easy on the server-side, given Glow - Switch the whole file-upload/download pipeline to be streamed rather than in-memory blobs (desirable anyway for memory usage of clients, and ability to send > ~1.5GB files) - Switch crypto to AES-GCM to solve hashing causality -- Switch to use async file uploads (MSC2246)[https://github.com/matrix-org/matrix-spec-proposals/pull/2246] so you can send the event to reference the URL before finishing uploading the media +- Switch to use async file uploads [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246) so you can send the event to reference the URL before finishing uploading the media - Switch the whole voice message UI to calculate waveforms locally as the file is received (rescaling the waveform as more message is received!?), and to allow scrubbing within a file whose length is changing from under you - Optional: random access to the download, letting the receiver jump to the end of the voice message without having to download the rest of the file first. (a 2 hour voice message is roughly 20MB, so it might not be a disaster to have to download the whole VM first rather than doing random access). This would mean: - Change crypto to send in self-contained AES-GCM blocks (of 16KB or so), to support random access