diff --git a/proposals/4101-unencrypted-media-hashes.md b/proposals/4101-unencrypted-media-hashes.md new file mode 100644 index 00000000..0d3287ee --- /dev/null +++ b/proposals/4101-unencrypted-media-hashes.md @@ -0,0 +1,172 @@ +# MSC4101: Hashes for unencrypted media + +A typical flow for unencrypted media being sent in a room looks like this: + +``` ++---------+ +---------+ +---------+ +---------+ +| ClientA | | Origin | | Remote | | ClientB | ++---------+ +---------+ +---------+ +---------+ + | | | | + | /upload | | | + |-------------------------->| | | + | | | | + | content_uri | | | + |<--------------------------| | | + | | | | + | /send/m.room.message | | | + |-------------------------->| | | + | | | | + | | Append PDU fields | | + | |------------------ | | + | | | | | + | |<----------------- | | + | | | | + | | /send (federation) | | + | |------------------------>| | + | | | | + | | | /sync | + | | |--------------->| + | | | | + | | | /download | + | | |<---------------| + | | | | + | | /download | | + | |<------------------------| | + | | | | + | | bytes | | + | |------------------------>| | + | | | | + | | | bytes | + | | |--------------->| + | | | | +``` + + +For encrypted rooms, the media is encrypted before being uploaded, and the decryption key material is +further encrypted before `/send`ing an event to the origin server. The (encrypted) `file` information +includes a sha256 hash of the *encrypted* blob that was uploaded to the server, described by +[`EncryptedFile`](https://spec.matrix.org/v1.9/client-server-api/#sending-encrypted-attachments). + +Because the hash is encrypted by the sending client, the server is unable to meaningfully change the +content of that file. Any difference in the encrypted blob would result in a mismatched hash, which +the server cannot modify because it can't see the hash itself. This effectively authenticates the +media blob to the event (and thus the DAG) from the view of the client. + +However, unencrypted media does not have similar authentication measures. When responding to the remote +server's `/download` request, the origin server could serve a completely different file without either +user being aware. Further, if a user does report that they are seeing something potentially unexpected, +the origin server has plausible deniability that the wrong file was served. + +For maximum security against this problem, rooms should be encrypted. This proposal introduces an +optional sha256 hash on unencrypted media to remove *part* of the plausible deniability problem, but +does not solve it. An origin server can still modify both the upload *and* hashes in an event before +that event is converted to a PDU and sent to other servers. Once the PDU is sent though, the download +is authenticated by the hash present in the DAG. + +## Proposal + +Similar to the `EncryptedFile` schema, a new `hashes` field is introduced to `m.room.message` events +containing file/media references, including the thumbnail if present. An example image message would +be: + +```jsonc +{ + "type": "m.room.message", + "content": { + "msgtype": "m.image", + "body": "image.png", + "url": "mxc://example.org/abc123", + "info": { + "size": 33186, + "mimetype": "image/png", + "w": 500, + "h": 500, + "hashes": { // NEW! + "sha256": "" + }, + "thumbnail_url": "mxc://example.org/def456", + "thumbnail_info": { + "size": 3816, + "mimetype": "image/png", + "w": 128, + "h": 128, + "hashes": { // NEW! + "sha256": "" + } + } + } + } +} +``` + +Similar to encrypted files, the sha256 hash is encoded using [Unpadded Base64](https://spec.matrix.org/v1.9/appendices/#unpadded-base64) +and covers the blob uploaded to the homeserver. Unlike `EncryptedFile` though, we place the hashes +inside the `[thumbnail_]info` object rather than the non-existent `file` object. This existing +inconsistency is expected to be resolved by future MSCs, such as [MSC3551](https://github.com/matrix-org/matrix-spec-proposals/pull/3551) +for Extensible Events. + +`hashes` is optional, but when supplied *must* contain `sha256` at a minimum. When using `EncryptedFile`, +the `hashes` object described by this MSC serves no purpose and *must* be ignored by clients (if present). + +Clients *should* verify the hash when downloading the media, and refuse to render/offer to save the +media when the hash is mismatched, or when `hashes` is malformed. In future, [`GET /download`](https://spec.matrix.org/v1.9/client-server-api/#get_matrixmediav3downloadservernamemediaid) +could be expanded to take a sha256 parameter to avoid "wasting" the client's bandwidth, however many +implementations already stream the media from origin to local clients while concurrently caching for +future requests. + +## Potential issues + +Several issues with this proposal are discussed in the security considerations section. + +## Alternatives + +No alternatives identified. + +## Security considerations + +This proposal increases security when an entity is attempting to tie a media blob to the DAG, but is +still vulnerable to a replacement attack during the original upload and sending process. Because the +hashes and media itself are not protected by a meaningful form of encryption, the origin server is +still capable of replacing the media blob and intercepting the client's event send request to change +the hash to match the malicious blob. Some clients will detect that their event changed when submitted +to the homeserver, though most will not. + +Similarly, a local (remote) server could change the presented hash in an event before sending it down +to clients. Clients will believe these changes in most cases because they do not have the capability +to validate the DAG itself. + +This proposal does *not* attempt to fix either tampering issue for unencrypted media. Encrypting events +(and thus media) already solves these issues. Instead, this proposal ties a blob to the DAG itself, +allowing entities processing that DAG to authenticate the media accordingly. This may be useful in +cases where a well-behaved remote server is attempting to prove that a user did in fact receive a +corrupt or maliciously modified file, or when a server is counting references to media before purging +it from a local cache. + +(Servers which use reference counters should note that encrypted events can reference *unencrypted* +media as well, so should take care to not delete media they may not be able to re-request when a +client requests it.) + +## Unstable prefix + +While this proposal is not considered stable, implementations should use `org.matrix.msc4101.hashes` +in place of `hashes` in events. + +## Dependencies + +This MSC has no dependencies, but does interact with MSCs which link events to media. For example, +[MSC3911](https://github.com/matrix-org/matrix-spec-proposals/pull/3911) may have increased security +if intermediate servers can verify not only that a user has access to the specific blob URI, but also +that the blob tied to that event is exactly what was sent. Further iteration may be required to support +encrypted media meaningfully in this scenario.