You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
matrix-spec-proposals/proposals/3468-mxc-hash.md

6.9 KiB

MSC3468: MXCs to Hashes

Currently, matrix media/content repositories work with a MXC to blob mapping, fetching the media from the domain embedded in the MXC to present it to the user.

However, this becomes a problem when media retention, redaction, and resiliency come into play, the singular MXC URI becoming a point of failure once the backing server retracts the URI, either deliberately (aforementioned redaction), or accidentally (via server reset, or losing the backing media).

This is in opposition to how MXCs are used in matrix today, much like Discord media URLs; immutable and always online, links are copied and reused across rooms.

Proposal

I propose for MXCs to be reworked into being a pointer to hashes.

This gives the extra benefit of decoupling aliasing pointers (such as the MXC is) with the underlying media.

Alongside this change, I also propose for an additional client-side endpoint which can quickly "clone" a MXC. This being done by having the server look up the MXC's hash, and then creating a new MXC also referencing that hash.

The client-server content API would expose a method for the client to retrieve the hash of a particular MXC, alongside aforementioned method to clone it.

The server-server content API would add a dedicated fetch method for fetching the hash to a MXC, and fetching the media to a hash.

Specification

Client-Server

This proposal would like to add the following two methods to CS;

POST _matrix/media/v1/clone/{serverName}/{mediaId}

Rate-limited: Yes
Authentication: Yes

Responses:
  200: JSON (see below)
  429: Ratelimited
  503: Could not fetch remote MXC-to-hash mapping

200 response:

{
  "m.clone.mxc": "mxc://local.server/media_id"
}
GET _matrix/media/v1/hash/{serverName}/{mediaId}

Rate-limited: Yes
Authentication: Yes

Responses:
  200: JSON (see below)
  429: Ratelimited
  503: Could not fetch remote MXC-to-hash mapping

200 response:

{
  "m.mxc.hash": "1234567890abcdef" // hex-encoded hash
}

Server-Server

This proposal would like to add the following two endpoints to S2S;

GET _matrix/federation/v1/media/hash

Rate-limited: No
Authentication: Yes

Query parameters:
  media_id: string, the local part of an MXC for which the hash is queried

Responses:
  200: Pure-binary encoding of corresponding hash
  404: Media ID does not exist
GET _matrix/media/v1/media/fetch/{hash}

Rate-limited: Yes
Authentication: Yes

Responses:
  200: Blob of data corresponding to hash
  404: Hash-media not found
  429: Ratelimited

"Which hash?"

Note: this is an area of feedback, this'll be removed in the final draft

So far, the definition of "hash" has been vague. I think converging on a specific hash function could be a lock-in for future expansion.

So, i'd like to propose using multihash for these purposes, this would allow a common format self-describing the hashes used.

For now, only a set series of hashes would be included (see here for a full table), which can be expanded/deprecated with subsequent matrix spec releases, without changing up the format of the hash, or documenting checks to differentiate the types of hash used, or to reinvent multihash.

However, this is up for debate.

Motivation

This MSC wishes to unblock efforts for media retention and redaction;

By addition of the /clone endpoint, any client wishing to preserve media, can do so by simply fetching/storing media locally, reducing the linkrot effect that remote servers redacting media could have.

This MSC would also wish to make matrix more flexible for diverse media delivery systems.

Mapping MXCs to hashes could allow the hashes themselves to become self-verifying keys in any (centralized or distributed) KV store.

This, in turn, could prepare matrix better for P2P efforts.

This MSC also wishes to make matrix content delivery more resilient, with the exception of mapping a MXC alias to a hash, a hash could be retrieved from anywhere, and still be self-verifying, considerably lessening the bus factor, and allowing for better distributed load (see the first "future extension" in below section)

Potential issues

This could have a slight performance hit, as an extra RTT between servers is needed to fetch the media actual, after fetching the hash corresponding to that bit of media.

I think this is a more acceptable tradeoff, an alternative would be to side-channel the hash in a header, in an endpoint fetching directly from a MXC.

Future extensions

Note: this is free-form speculation, and serves to illustrate how future MSCs can extend the behavior this MSC is enabling.

A possible extension would be a server-server endpoint which requests what recommended content endpoints would be to fetch hashes from.

(I.e. a server would ask /media/endpoints, and the server can respond with ["https://common.caching.server", "https://matrix.org"], in decreasing order of priority)

This can be helpful when servers share a common "media server", as is the case today with matrix-media-repo, which "tricks" federation by redirecting any request for media to itself. This future extension would formalize this process.

This would also be helpful with dealing with "thundering herds", as servers can be redirected to multiple servers to fetch media from a hash from.

(However, as-is, this could have security problems with DoS-ing, issues with cache invalidation after redacting media, and possibly more. This is only to illustrate flexibility.)

Another possible extension could be to allow to tap in natively to decentralized media stores, which often key their data to hashes. This could make media P2P easier to implement and work with.

One last possible extension is to add 410 to every endpoint pertaining fetching media, this could help with communicating that media has been deleted to servers and clients.

Security considerations

A big part of this MSC's motivation is to unblock media redaction/retention efforts. However, that does not mean this MSC should be blind to the struggle of containing unsavory media across federation.

This MSC adds a /clone endpoint, by which a client, on any server, could easily "copy" media, seemingly making containment efforts useless.

However, at a room-level, and possibly a server-level, hashes themselves could be banned. This can be implementation-specific, or be built-into bots like mjolnir.

Unstable prefix

This MSC uses the unstable prefix nl.automatia.msc3468;

  • _matrix/media/nl.automatia.msc3468/clone/{serverName}/{mediaId}
  • _matrix/media/nl.automatia.msc3468/hash/{serverName}/{mediaId}
  • _matrix/federation/nl.automatia.msc3468/media/hash
  • _matrix/media/nl.automatia.msc3468/media/fetch/{hash}
  • nl.automatia.msc3468.clone.mxc
  • nl.automatia.msc3468.mxc.hash