You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
matrix-spec-proposals/proposals/4095-bundled-url-previews.md

12 KiB

Bundled URL previews

Currently, URL previews in Matrix are generated on the server when requested by a client using the /_matrix/media/v3/preview_url endpoint. This is a relatively good approach, but a major downside is that the user's homeserver gets all links the user's client wants to show a preview for, which means using it in encrypted rooms will effectively leak parts of messages.

Proposal

The proposed solution is allowing clients to bundle URL preview metadata inside events.

A new field called m.url_previews is added. The field is an array of objects, where each object contains OpenGraph data representing a single URL to preview, similar to what the /preview_url endpoint currently returns:

  • matrix:matched_url - The URL that is present in body and triggered this preview to be generated. This is optional and should be omitted if the link isn't present in the body.
  • matrix:image:encryption - An EncryptedFile object for encrypted thumbnail images. Similar to encrypted image messages, the URL is inside this object, and not in og:image.
  • matrix:image:size - The byte size of the image, like in /preview_url.
  • og:image - An mxc:// URI for unencrypted images, like in /preview_url.
  • og:url - Standard OpenGraph tag for the canonical URL of the previewed page.
  • Any other standard OpenGraph tags.

At least one of matrix:matched_url and og:url MUST be present. All other fields are optional.

URL previews are primarily meant for text-based message types (m.text, m.notice, m.emote), but they may be used with any message type, as even media messages may have captions in the future.

Allowing the omission of matched_url is effectively a new feature to send URL previews without a link in the message text.

Extensible events

The definition of matrix:matched_url changes from "present in body" to "present in m.text", but otherwise the proposal is directly compatible with extensible events.

Client behavior

Sending preview data

When sending previews to encrypted rooms, clients should encrypt preview images and put them in the matrix:image:encryption field. Other og:image:* and the matrix:image:size field can still be used for image metadata, but the og:image field should be omitted for encrypted thumbnails.

If clients use the /preview_url endpoint as a helper for generating preview data, they should reupload the thumbnail image (if there is one) to create a persistent mxc:// URI, as well as encrypt it if applicable. A future MSC could also extend /preview_url with a parameter to request a persistent URI.

Receiving messages with m.url_previews

If an object in the list contains only matrix:matched_url or og:url (but not both) and no other fields, receiving clients should fall back to the old behavior of requesting a preview using /preview_url.

Clients may choose to ignore bundled data and ask the homeserver for a preview even if bundled data is present, as a security measure against faking preview data.

Clients may also choose to verify that the matched_url is present in the body field before displaying a full preview. However, in order to avoid losing data, clients SHOULD still display ignored entries somehow, e.g. just rendering the link (either og:url or matrix:matched_url) instead of a full preview.

Note: ignoring bundled data does not mean ignoring the m.url_preview field: even when ignoring bundled data and/or verifying that matched_url is present in body, clients should only display previews for URLs that are present in the list, and should never display previews for URLs that aren't present in the list.

If the m.url_previews field is not present at all, clients should fall back to the old behavior of searching body.

The above points effectively make this an alternative for MSC2385.

Examples

Normal preview
{
  "type": "m.room.message",
  "content": {
    "msgtype": "m.text",
    "body": "https://matrix.org",
    "m.url_previews": [
      {
        "matrix:matched_url": "https://matrix.org",
        "matrix:image:size": 16588,
        "og:description": "Matrix, the open protocol for secure decentralised communications",
        "og:image": "mxc://maunium.net/zeHhTqqUtUSUTUDxQisPdwZO",
        "og:image:height": 400,
        "og:image:type": "image/jpeg",
        "og:image:width": 800,
        "og:title": "Matrix.org",
        "og:url": "https://matrix.org/"
      }
    ],
    "m.mentions": {}
  }
}
Preview with encrypted thumbnail image
{
  "type": "m.room.message",
  "content": {
    "msgtype": "m.text",
    "body": "https://matrix.org",
    "m.url_previews": [
      {
        "matrix:matched_url": "https://matrix.org",
        "og:url": "https://matrix.org/",
        "og:title": "Matrix.org",
        "og:description": "Matrix, the open protocol for secure decentralised communications",
        "matrix:image:size": 16588,
        "og:image:width": 800,
        "og:image:height": 400,
        "og:image:type": "image/jpeg",
        "matrix:image:encryption": {
          "key": {
            "k": "GRAgOUnbbkcd-UWoX5kTiIXJII81qwpSCnxLd5X6pxU",
            "alg": "A256CTR",
            "ext": true,
            "kty": "oct",
            "key_ops": [
              "encrypt",
              "decrypt"
            ]
          },
          "iv": "kZeoJfx4ehoAAAAAAAAAAA",
          "hashes": {
            "sha256": "WDOJYFegjAHNlaJmOhEPpE/3reYeD1pRvPVcta4Tgbg"
          },
          "v": "v2",
          "url": "mxc://beeper.com/53207ac52ce3e2c722bb638987064bfdc0cc257b"
        }
      }
    ],
    "m.mentions": {}
  }
}
Message indicating it should not have any previews
{
  "type": "m.room.message",
  "content": {
    "msgtype": "m.text",
    "body": "https://matrix.org",
    "m.url_previews": [],
    "m.mentions": {}
  }
}
Message indicating a preview should be fetched from the homeserver
{
  "type": "m.room.message",
  "content": {
    "msgtype": "m.text",
    "body": "https://matrix.org",
    "m.url_previews": [
      {
        "matrix:matched_url": "https://matrix.org"
      }
    ],
    "m.mentions": {}
  }
}
Preview in extensible event
{
  "type": "m.message",
  "content": {
    "m.text": [
      {"body": "matrix.org/support"}
    ],
    "m.url_previews": [
      {
        "matrix:matched_url": "matrix.org/support",
        "matrix:image:size": 16588,
        "og:description": "Matrix, the open protocol for secure decentralised communications",
        "og:image": "mxc://maunium.net/zeHhTqqUtUSUTUDxQisPdwZO",
        "og:image:height": 400,
        "og:image:type": "image/jpeg",
        "og:image:width": 800,
        "og:title": "Support Matrix",
        "og:url": "https://matrix.org/support/"
      }
    ],
    "m.mentions": {}
  }
}

Potential issues

Fake preview data

The message sender can fake previews quite trivially. This is considered an acceptable compromise to achieve non-leaking URL previews in encrypted rooms.

As mentioned in the client behavior section, clients may choose to ignore embedded preview data in unencrypted rooms and always use the /preview_url endpoint, effectively only using m.url_previews as a whitelist of URLs to preview.

More image uploads

Currently previews are generated by the server, which lets the server apply caching and delete thumbnail images quickly. If the data was embedded in events instead, the server would not be able to clean up images the same way.

Web clients

Web clients likely can't generate previews themselves due to CORS and other such protections.

Clients could use the existing URL preview endpoint to generate a preview and bundle that data in events, which has the benefit of only leaking the link to one homeserver (the sender's) instead of all servers. When doing this, clients would have to download the preview image and reupload it to get a persistent mxc:// URI, and possibly encrypt it before uploading.

Alternatively, clients could simply not include preview data at all and have receiving clients fall back to the old behavior (meaning no previews in encrypted rooms unless the receiver opts in).

Security considerations

Fake preview data as covered in potential issues.

Visibility in old clients (T&S)

Clients that don't support this MSC will not display any of the data in the preview field, which could be abused by spammers if all moderators in a room are using old clients.

Generating previews will leak IPs

The sender's client will leak its IP when it fetches previews for URLs typed by the user. This is generally an acceptable tradeoff, as long as clients take care never to generate previews for links the user did not type.

For example, if a client generates reply fallbacks, it MUST NOT generate previews for links in the fallback. Clients should also be careful with links when starting to edit a message, possibly by not generating new previews at all.

Clients may also provide extra safeguards, such as only offering a button to generate previews, rather than generating them immediately after the user types a URL. However, this is a UX decision and is therefore ultimately up to the client to decide.

Clients could also use a privacy-preserving TCP relay to proxy all URL preview requests like Signal does. That way the client wouldn't leak its IP, and the relay wouldn't see previewed URLs. However, running such a proxy has several potential security issues for the server administrators, so it is out of scope for this MSC.

Previewing code must be implemented carefully

When generating URL previews, clients are parsing completely untrusted data. Parsing responses must be done with care to prevent content-based attacks, such as the billion laughs attack.

Local IPs should not be previewed by default

Clients should prevent previewing non-public IP addresses by default. To do this, clients must check the DNS records of a domain before connecting to the resolved IP, as public domains may point to private IPs. For web clients, these limits are generally handled by the browser (see the Private Network Access spec).

Alternatives

Different generation methods

Previews could be generated by the receiving client, which both doesn't leak links to the user's homeserver, and prevents fake previews. However, this would leak the user's IP address to all links they receive, so it is not an acceptable solution.

The original design notes for URL previews from 2016 also has a list of options that were considered at the time: https://github.com/matrix-org/matrix-spec/blob/main/attic/drafts/url_previews.md. Option 2 is what was implemented then, and this proposal adds option 4. The combination of options 2 and 4 is also mentioned as the probably best solution in that document.

The document also mentions the possibility of an AS or HS scanning messages and injecting preview data, but that naturally won't function with encryption at all, and is therefore not an alternative.

The fifth option mentioned in the document, a centralized previewing service which is configured per-room, could technically work, but would likely be worse than HS-generated previews in practice: users wouldn't know to configure a different previewing service, so clients would probably have to automatically pick one.

Unstable prefix

Until this MSC is accepted, implementations should apply the following renames:

  • com.beeper.linkpreviews instead of m.url_previews
  • beeper:image:encryption instead of matrix:image:encryption
  • matched_url instead of matrix:matched_url
    • note: this was implemented without a prefix before the MSC was made, which is why the "unstable prefix" is no prefix in this case.