From ef5baec8a421a9d30d0adcb7ec1a654436336941 Mon Sep 17 00:00:00 2001 From: Tulir Asokan Date: Mon, 19 Feb 2024 18:12:44 +0200 Subject: [PATCH] MSC2530: Body field as media caption (#2530) * Proposal to use body field as media caption * Add paragraph about relation-based captions being difficult for bridges * Clarify how to treat body when filename is not present * Refactor proposal text * Fix heading size * Add problem statement * Add links to and quotes from current spec * Adjust wording and quote m.audio body spec * Clarify that m.location and m.sticker are out of scope for this proposal * Add examples and summary of changes * Fix JSON syntax in example --- proposals/2530-body-as-caption.md | 144 ++++++++++++++++++++++++++++++ 1 file changed, 144 insertions(+) create mode 100644 proposals/2530-body-as-caption.md diff --git a/proposals/2530-body-as-caption.md b/proposals/2530-body-as-caption.md new file mode 100644 index 00000000..246b3f97 --- /dev/null +++ b/proposals/2530-body-as-caption.md @@ -0,0 +1,144 @@ +# Body field as media caption + +When sending images or other attachments, users often want to include text to +convey additional information. Most chat platforms offer media captions as a +first-class feature, allowing users to choose the attachment and write text, +then send both together in one message. + +Matrix currently does not enable this on the protocol level: at best, clients +can emulate the behavior by sending two messages quickly; at worst, the user +has to do that manually. Sending separate messages means it's possible for +the second message to be delayed or lost if something goes wrong. + +## Proposal + +This proposal allows the `filename` field from [`m.file`], and the `format` and +`formatted_body` fields from [`m.text`] for all media msgtypes (`m.image`, +`m.audio`, `m.video`, `m.file`). This proposal does not affect the `m.location` +msgtype, nor the separate `m.sticker` event type: stickers already use `body` +as a description, and locations don't have file names. + +If the `filename` field is present in a media message, clients should treat +`body` as a caption instead of a file name. If the `format`/`formatted_body` +fields are present in addition to `filename` and `body`, then they should take +priority as the caption text. Formatted text in media captions is rendered the +same way as formatted text in `m.text` messages. + +The current spec is somewhat ambiguous as to how `body` should be handled and +the definition varies across different message types. The current spec for +[`m.image`] describes `body` as + +> A textual representation of the image. This could be the alt text of the +> image, the filename of the image, or some kind of content description for +> accessibility e.g. ‘image attachment’. + +while [`m.audio`] describes it as + +> A description of the audio e.g. ‘Bee Gees - Stayin’ Alive’, or some kind of +> content description for accessibility e.g. ‘audio attachment’. + +In practice, clients (or at least Element) use it as the file name. As a part +of adding captions, the `body` field for all media message types is explicitly +defined to be used as the file name when the `filename` field is not present. + +For `m.file` messages, the [current (v1.9) spec][`m.file`] confusingly defines +`filename` as "The original filename of the uploaded file" and simultaneously +recommends that `body` is "the filename of the original upload", effectively +saying both fields should have the file name. In order to avoid (old) messages +with both fields being misinterpreted as having captions, the `body` field +should not be used as a caption when it's equal to `filename`. + +[`m.file`]: https://spec.matrix.org/v1.9/client-server-api/#mfile +[`m.text`]: https://spec.matrix.org/v1.9/client-server-api/#mtext +[`m.image`]: https://spec.matrix.org/v1.9/client-server-api/#mimage +[`m.audio`]: https://spec.matrix.org/v1.9/client-server-api/#maudio + +### Examples +
+Image with caption + +```json +{ + "msgtype": "m.image", + "url": "mxc://maunium.net/HaIrXlnKfEEHvMNKzuExiYlv", + "filename": "cat.jpeg", + "body": "this is a cat picture :3", + "info": { + "w": 479, + "h": 640, + "mimetype": "image/jpeg", + "size": 27253 + }, + "m.mentions": {} +} +``` + +
+
+File with formatted caption + +```json +{ + "msgtype": "m.file", + "url": "mxc://maunium.net/TizWsLhHfDCETKRXdDwHoAGn", + "filename": "hello.txt", + "body": "this caption is longer than the file itself 🤔", + "format": "org.matrix.custom.html", + "formatted_body": "this caption is longer than the file itself 🤔", + "info": { + "mimetype": "text/plain", + "size": 14 + }, + "m.mentions": {} +} +``` + +
+ +### Summary +* `filename` is defined for all media msgtypes. +* `body` is defined to be a caption when `filename` is present and not equal to `body`. + * `format` and `formatted_body` are allowed as well for formatted captions. +* `body` is defined to be the file name when `filename` is not present. + +## Potential issues + +In clients that don't show the file name anywhere, the caption would not be +visible at all. However, extensible events would run into the same issue. +Clients having captions implemented beforehand may even help eventually +implementing extensible events. + +Old clients may default to using the caption as the file name when the user +wants to download a file, which will be somewhat weird UX. + +## Alternatives + +### [MSC2529](https://github.com/matrix-org/matrix-spec-proposals/pull/2529) + +MSC2529 would allow existing clients to render captions without any changes, +but the use of relations makes implementation more difficult, especially for +bridges. It would require either waiting a predefined amount of time for the +caption to come through, or editing the message on the target platform (if +edits are supported). + +The format proposed by MSC2529 would also make it technically possible to use +other message types as captions without changing the format of the events, +which is not possible with this proposal. + +### Extensible events + +Like MSC2529, this would be obsoleted by [extensible events](https://github.com/matrix-org/matrix-spec-proposals/pull/3552). +However, fully switching to extensible events requires significantly more +implementation work, and it may take years for the necessary time to be +allocated for that. + +## Security considerations + +This proposal doesn't involve any security-sensitive components. + +## Unstable prefix + +The fields being added already exist in other msgtypes, so unstable prefixes +don't seem necessary. Additionally, using `body` as a caption could already be +considered spec-compliant due to the ambiguous definition of the field, and +only adding unstable prefixes for the other fields would be silly.