From 209a99a658b69da74544f907082f6b5fa5920991 Mon Sep 17 00:00:00 2001 From: Travis Ralston Date: Mon, 29 Jan 2024 10:49:35 -0700 Subject: [PATCH] MSC2702: Specifying semantics for Content-Disposition on media (#2702) * Proposal to introduce sane Content-Disposition semantics * Update proposals/2702-media-content-disposition.md Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> * Update for the year 2023 * Clarify thumbnails receiving Content-Disposition header * Clarify layering * Update proposals/2702-media-content-disposition.md Co-authored-by: David Baker * Fix wording * Apply suggestions from code review Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> * Clarify wording * Add clarifying words * Adjust security section --------- Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> Co-authored-by: David Baker Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> --- proposals/2702-media-content-disposition.md | 128 ++++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 proposals/2702-media-content-disposition.md diff --git a/proposals/2702-media-content-disposition.md b/proposals/2702-media-content-disposition.md new file mode 100644 index 00000000..a96eb6c9 --- /dev/null +++ b/proposals/2702-media-content-disposition.md @@ -0,0 +1,128 @@ +# MSC2702: `Content-Disposition` usage in the media repo + +The current specification does not clarify how to treat `Content-Disposition` on responses to +[`/download`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixmediav3downloadservernamemediaid) +and [`/thumbnail`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixmediav3thumbnailservernamemediaid) +requests. This has led to clients (and most notably, IRC bridges) relying on implementation-specific +behaviour for how their uploads will be represented. This reliance has caused issues in the past where +a security decision was made to use an `attachment` disposition on the download endpoint, however this +caused an IRC bridge's use of the media repo to break when trying to send large messages: users on the +IRC side were forced to download and open a text file instead of seeing it in their browser. Similar +problems have occurred in the past with respect to clients using the download endpoint as an "open in +browser" button. + +This proposal clarifies the exact behaviour and introduces a set of suggestions for servers to follow +with respect to `Content-Disposition`. + +## Context + +Headers: +* [`Content-Disposition`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition) + +Endpoints affected: + +* [`/download`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixmediav3downloadservernamemediaid) +* [`/thumbnail`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixmediav3thumbnailservernamemediaid) + +Upload endpoints: +* [`POST /upload`](https://spec.matrix.org/v1.8/client-server-api/#post_matrixmediav3upload) +* [`PUT /upload`](https://spec.matrix.org/v1.8/client-server-api/#put_matrixmediav3uploadservernamemediaid) + +## Proposal + +For `/download`: + +* `Content-Disposition` MUST be returned, and MUST be one of `inline` or `attachment`. + + * `inline` *should* be used when one of the "safe" content types listed below is being served. + +* When uploads are made with a `filename`, the `Content-Disposition` header MUST contain the same + `filename`. Otherwise, `filename` is excluded from the header. + + * If the media being downloaded is remote, the remote server's `filename` in the `Content-Disposition` + header is used as the `filename` instead. When the header is not supplied, or does not supply a + `filename`, the local download response does not include a `filename` (though does still contain + a generated `Content-Disposition`). + +For `/thumbnail`: + +* `Content-Disposition` MUST be returned, and MUST be `inline` (see below). +* `Content-Disposition` SHOULD include a server-generated `filename`. For example, `thumbnail.png`. + +Note that in both endpoints `Content-Disposition` becomes required, though the legal set of parameters is +intentionally different. Specifically, because `/thumbnail` returns server-generated content, that content +is safe to serve inline relative to a given user upload and therefore can be inline. It is however atypical +for a client to link to `/thumbnail` directly, but in the event they do we provide a safe default. + +The following content types are considered "safe" for `inline` usage. For `/download`, servers SHOULD +use `attachment` if the returned `Content-Type` is not on the list. For `/thumbnail`, servers SHOULD +only generate thumbnails with a `Content-Type` listed below. + +* `text/css` +* `text/plain` +* `text/csv` +* `application/json` +* `application/ld+json` +* `image/jpeg` +* `image/gif` +* `image/png` +* `image/apng` +* `image/webp` +* `image/avif` +* `video/mp4` +* `video/webm` +* `video/ogg` +* `video/quicktime` +* `audio/mp4` +* `audio/webm` +* `audio/aac` +* `audio/mpeg` +* `audio/ogg` +* `audio/wave` +* `audio/wav` +* `audio/x-wav` +* `audio/x-pn-wav` +* `audio/flac` +* `audio/x-flac` + +`Content-Type` additionally becomes a required header on responses to both `/download` and `/thumbnail`, as `Content-Disposition` +without `Content-Type` is effectively useless in HTTP. The `Content-Type` header is the `Content-Type` supplied by +the client during `/upload`. If no `Content-Type` was supplied during upload, `application/octet-stream` is used. + +Clients SHOULD NOT rely on servers returning `inline` rather than `attachment` on `/download`. Server +implementations might decide out of an abundance of caution that all downloads are responded to with +`attachment`, regardless of content type - clients should not be surprised by this behaviour. + +## Potential issues + +This proposal does not require the usage of `inline` on `/download`, making it harder for IRC and similar +bridges to rely on "pastebin" behaviour. For example, when a large message is posted on the Matrix side of +the bridge, the IRC bridge might upload it as a text file due to limits on the IRC side. Ideally, that text +file would be rendered inline by the server. Bridges are encouraged to use "proxy APIs" to serve the text +file instead, where they can better control the user experience. + +## Alternatives + +No major alternatives identified. + +## Security considerations + +This MSC fixes a possible Cross-Site Scripting issue in environments with insecure configurations. +For example, if a Matrix web client is also hosted on the same domain as the media download URL. This +is mitigated by only suggesting `inline` as a disposition on content types which are not likely to +execute code within the browser upon being viewed. A browser may still have further bugs which reveal +information, though those issues quickly become impractical for the Matrix specification to mitigate. + +No new security issues are identified, and careful consideration was put into `inline` to ensure an +extremely limited set of possible media types. + +The allowable content types for `inline` were inspired by [the react-sdk](https://github.com/matrix-org/matrix-react-sdk/blob/a70fcfd0bcf7f8c85986da18001ea11597989a7c/src/utils/blobs.ts#L51), +and extended to include what is present in [a related Synapse PR](https://github.com/matrix-org/synapse/pull/15988). +The react-sdk list of types has been known to be safe for several years now, and the added types in the Synapse +PR are not subject to XSS or similar attacks. Note that `image/svg`, `text/javascript`, and `text/html` are +explicitly not allowed. + +## Unstable prefix + +This MSC has no particular unstable prefix requirements. Servers are already able to return arbitrary +`Content-Disposition` headers on the affected endpoints.