You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
matrix-spec-proposals/proposals/3291-muting.md

5.4 KiB

MSC3291: Muting in VoIP calls

During VoIP calls, it is common for a user to mute their microphone/camera. Ideally, the other side should be able to see that the opponent's camera is muted, so that it could reflect this in the UI (e.g. show the user's avatar instead of their camera feed). We would also want the changes in the mutes state to be quick.

Using pure WebRTC there are two ways to do muting and both have their issues:

  • Disabling the corresponding track
  • Setting the corresponding track as recvonly/inactive

The Alternatives section describes the issues with using these alone.

Proposal

This MSC proposes extending the sdp_stream_metadata object (see MSC3077) to allow indicating the mute state to the other side using the following fields:

  • audio_muted - a boolean indicating the current audio mute state
  • video_muted - a boolean indicating the current video mute state

This MSC also adds a new call event m.call.sdp_stream_metadata_changed, which has the common VoIP fields as specified in MSC2746 (version, call_id, party_id) and a sdp_stream_metadata object which is the same thing as sdp_stream_metadata in m.call.negotiate, m.call.invite and m.call.answer. The client sends this event when the sdp_stream_metadata has changed but no negotiation is required (e.g. the user mutes their camera/microphone).

All tracks should be assumed unmuted unless specified otherwise.

Clients are recommended to not mute the audio of WebRTC tracks locally when a incoming stream has the audio_muted field set to true. This is because when the other user unmutes themselves, there may be a slight delay between their client sending audio and the m.call.sdp_stream_metadata_changed event arriving. If enabled is set to false, then any audio sent in between those two events will not be heard. The other user will still stop transmitting audio once they mute on their side, so no audio is sent without the user's knowledge.

The same suggestion does not apply to video_muted - there clients should mute video locally, so that the receiving side doesn't see black video.

Example

{
    "type": "m.call.sdp_stream_metadata_changed",
    "room_id": "!roomId",
    "content": {
        "version": "1",
        "call_id": "1414213562373095",
        "party_id": "1732050807568877",
        "sdp_stream_metadata": {
            "2311546231": {
                "purpose": "m.usermedia",
                "audio_muted:": true,
                "video_muted": true
            }
        }
    }
}

This event indicates that both audio and video are muted. It is suggested the video track of stream 2311546231 should be hidden in the UI (probably replaced by an avatar). It also suggests the UI should show an indication that the audio track is muted but the client should not mute the audio on the receiving side.

Potential issues

When the user mutes their camera, some browsers may keep sending meaningless data which will waste bandwidth.

Alternatives

Only disabling the corresponding track

This is the solution that some clients (e.g. Element Android) use at the moment. While this is almost instantaneous, it doesn't allow the other side to know the opponent's mute state. This leads to the opponent showing a black screen for a muted video track and not doing anything for a muted audio track which is bad for UX.

Setting the corresponding track as recvonly/inactive

While this would be beneficial for low bandwidth connections, it takes time. The delay might be acceptable for video but isn't for audio (with which you would assume an instantaneous mute state change). This is also problematic since there could be a confusion with holding (as defined in MSC2746).

Using a separate event for muting

While this might feel clearer initially, it doesn't have much real benefit. The mute state is in fact a meta information about the stream and using sdp_stream_metadata is also more flexible for cases where the user joins a call already muted. It is also more flexible in general and would be useful if we ever decided to do what is described in the next section.

A combination of disabling tracks, sdp_stream_metadata and SDP

An option would be using the current method in combination with setting the corresponding track as recvonly/inactive. Along with this clients would need to set the mute state in sdp_stream_metadata to avoid conflicts with holding (as defined in MSC2746). While this solution might be the most flexible solution as it would allow clients to choose between bandwidth and a mute state change delay for each track, it would be harder to implement and feels generally disjointed.

Security considerations

None that I can think of.

Dependencies

Unstable prefix

Release Development
m.call.sdp_stream_metadata_changed org.matrix.call.sdp_stream_metadata_changed
sdp_stream_metadata org.matrix.msc3077.sdp_stream_metadata

We use an unstable prefix for sdp_stream_metadata to match MSC3077.