From d8be2ad942c7f04fb2537f996f6c1c9221207c6d Mon Sep 17 00:00:00 2001 From: Johannes Marbach Date: Fri, 26 Sep 2025 16:36:34 +0200 Subject: [PATCH] The `server-name` segment of MXC URIs is sanitised differently from the `media-id` segment (#2217) Fixes: #1990 Signed-off-by: Johannes Marbach --- .../client_server/newsfragments/2217.clarification | 1 + content/client-server-api/modules/content_repo.md | 11 ++++++++--- 2 files changed, 9 insertions(+), 3 deletions(-) create mode 100644 changelogs/client_server/newsfragments/2217.clarification diff --git a/changelogs/client_server/newsfragments/2217.clarification b/changelogs/client_server/newsfragments/2217.clarification new file mode 100644 index 00000000..ea895054 --- /dev/null +++ b/changelogs/client_server/newsfragments/2217.clarification @@ -0,0 +1 @@ +The `server-name` segment of MXC URIs is sanitised differently from the `media-id` segment. diff --git a/content/client-server-api/modules/content_repo.md b/content/client-server-api/modules/content_repo.md index c70d04fc..80898d3a 100644 --- a/content/client-server-api/modules/content_repo.md +++ b/content/client-server-api/modules/content_repo.md @@ -134,9 +134,14 @@ entity isn't in the room. `mxc://` URIs are vulnerable to directory traversal attacks such as `mxc://127.0.0.1/../../../some_service/etc/passwd`. This would cause the target homeserver to try to access and return this file. As such, -homeservers MUST sanitise `mxc://` URIs by allowing only alphanumeric -(`A-Za-z0-9`), `_` and `-` characters in the `server-name` and -`media-id` values. This set of whitelisted characters allows URL-safe +homeservers MUST sanitise `mxc://` URIs by: + +- restricting the `server-name` segment to valid + [server names](/appendices/#server-name) +- allowing only alphanumeric (`A-Za-z0-9`), `_` and `-` characters in + the `media-id` segment + +The resulting set of whitelisted characters allows URL-safe base64 encodings specified in RFC 4648. Applying this character whitelist is preferable to blacklisting `.` and `/` as there are techniques around blacklisted characters (percent-encoded characters,