diff --git a/proposals/2312-matrix-uri.md b/proposals/2312-matrix-uri.md new file mode 100644 index 00000000..45bb2187 --- /dev/null +++ b/proposals/2312-matrix-uri.md @@ -0,0 +1,769 @@ +# URI scheme for Matrix + +This is a proposal of a URI scheme to identify Matrix resources in a wide +range of applications (web, desktop, or mobile) both throughout Matrix software +and (especially) outside it. It supersedes +[MSC455](https://github.com/matrix-org/matrix-doc/issues/455) in order +to continue the discussion in the modern GFM style. + +While Matrix has its own resource naming system that allows it to identify +resources without resolving them, there is a common need to provide URIs +to Matrix resources (e.g., rooms, users, PDUs) that could be transferred +outside of Matrix and then resolved in a uniform way - matching URLs +in World Wide Web. + +Specific use cases include: +1. Representation: as a Matrix user I want to refer to Matrix entities + in the same way as for web pages, so that others could unambiguously identify + the resource, regardless of the context or used medium to identify it to them + (within or outside Matrix, e.g., in a web page or an email message). +1. Inbound integration: as an author of Matrix software, I want to have a way + to invoke my software from the operating environment to resolve a Matrix URI + passed from another program. This is a case of, e.g., + opening a Matrix client by clicking on a link from an email message. +1. Outbound integration: as an author of Matrix software, I want to have a way + to export identifiers of Matrix resources to non-Matrix environment + so that they could be resolved in another time-place in a uniform way. + An example of this case is the "Share via…" action in a mobile Matrix client. + +Matrix identifiers as defined by the current specification have a form distinct +enough from other identifiers to mostly fulfil the representation use case. +Since they are not URIs, they can not cover the two integration use cases. +https://matrix.to somehow compensates for this; however: +* it requires a web browser to run JavaScript code that resolves identifiers + (basically limiting first-class support to browser-based clients), and +* it relies on matrix.to as an intermediary that provides that JavaScript code. + +To cover the use cases above, the following scheme is proposed for Matrix URIs +(`[]` enclose optional parts, `{}` enclose variables): +```text +matrix:[//{authority}/]{type}/{id without sigil}[/{type}/{id without sigil}...][?{query}][#{fragment}] +``` +with `{type}` defining the resource type (such as `r`, `u` or `roomid` - see +the "Path" section in the proposal) and `{query}` containing additional hints +or request details on the Matrix entity (see "Query" in the proposal). +`{authority}` and `{fragment}` parts are reserved for future use; this proposal +does not define them and implementations SHOULD ignore them for now. + +This MSC does not introduce new Matrix entities, nor API endpoints - +it merely defines a mapping between URIs with the scheme name `matrix:` +and Matrix identifiers, as well as operations on them. The MSC should be +sufficient to produce an implementation that would convert Matrix URIs to +a series of [CS API](https://matrix.org/docs/spec/client_server/latest) calls, +entirely on the client side. It is recognised, however, that most of +the URI processing logic can and should (eventually) be on the server side +in order to facilitate adoption of Matrix URIs; further MSCs are needed +to define details for that, as well as to extend the mapping to more resources +(including those without equivalent Matrix identifiers, such as room state or +user profile data). + +The Matrix identifier (or identifiers) can be reconstructed from +`{id without sigil}` by prepending a sigil character corresponding to `{type}`. +To support a hierarchy of Matrix resources, more `/{type}/{id without sigil}` +pairs can be appended, identifying resources within other resources. +As of now, there's only one such case, with exactly one additional pair - +pointing to an event in a room. + +Examples: +* Room `#someroom:example.org`: + `matrix:r/someroom:example.org` +* User `@me:example.org`: + `matrix:u/me:example.org` +* Event in a room: + `matrix:r/someroom:example.org/e/Arbitrary_Event_Id` +* [A commit like this](https://github.com/her001/steamlug.org/commit/2bd69441e1cf21f626e699f0957193f45a1d560f) + could make use of a Matrix URI in the form of + `{Matrix identifier}`. + + +## Proposal + +### Definitions + +Further text uses the following terms: +- Matrix identifier - one of identifiers defined by the current +[Matrix Specification](https://matrix.org/docs/spec/appendices.html#identifier-grammar), +- Matrix URI - a uniform resource identifier proposed hereby, following + the RFC-compliant URI format. +- MUST/SHOULD/MAY etc. follow the conventions of + [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). + + +### Requirements + +The following considerations drive the requirements for Matrix URIs: +1. Follow existing standards and practices. +1. Endorse the principle of the least surprise. +1. Humans first, machines second. +1. Cover as many entities as practical. +1. URIs are expected to be extremely portable and stable; + you cannot rewrite them once they are released to the world. +1. Ease of implementation, allowing reuse of existing codes. + +The following requirements resulted from these drivers: +1. Matrix URI MUST comply with + [RFC 3986](https://tools.ietf.org/html/rfc3986) and + [RFC 7595](https://tools.ietf.org/html/rfc7595). +1. By definition, Matrix URI MUST unambiguously identify a resource + in a Matrix network, across servers and types of resources. + This means, in particular, that two Matrix identifiers distinct by + [Matrix Specification](https://matrix.org/docs/spec/appendices.html#identifier-grammar) + MUST NOT have Matrix URIs that are equal in + [RFC 3986](https://tools.ietf.org/html/rfc3986) sense + (but two distinct Matrix URIs MAY map to the same Matrix identifier). +1. References to the following entities MUST be supported: + 1. User IDs (`@user:example.org`) + 1. Room IDs (`!roomid:example.org`) + 1. Room aliases (`#roomalias:example.org`) + 1. Event IDs (`$arbitrary_eventid_with_or_without_serverpart`) +1. The mapping MUST take into account that some identifiers + (e.g. aliases) can have non-ASCII characters - reusing + [RFC 3987](https://tools.ietf.org/html/rfc3987) is RECOMMENDED, + but an alternative encoding can be used if there are reasons for that. +1. The mapping between Matrix identifiers and Matrix URIs MUST + be extensible (without invalidating previous URIs) to: + 1. new classes of identifiers (there MUST be a meta-rule to produce + a new mapping for IDs following the `&somethingnew:example.org` + pattern assumed for Matrix identifiers); + 1. new ways to navigate to and interact with objects in Matrix + (e.g., we might eventually want to have a mapping for + room-specific user profiles). +1. The mapping MUST support decentralised as well as centralised IDs. + This basically means that the URI scheme MUST have provisions + for mapping of identifiers with `:` but it MUST NOT require + `:` to be there. +1. Matrix URI SHOULD allow encoding of action requests such as joining a room. +1. Matrix URI SHOULD have a human-readable, if not necessarily + human-friendly, representation - to allow visual sanity-checks. + In particular, characters escaping/encoding should be reduced + to bare minimum in that representation. As food for thought, see + [Wikipedia: Clean URL, aka SEF URL](https://en.wikipedia.org/wiki/Clean_URL) and + [a use case from RFC 3986](https://tools.ietf.org/html/rfc3986#section-1.2.1). +1. It SHOULD be easy to parse Matrix URI in popular programming + languages: e.g., one should be able to use `parseUri()` + to dissect a Matrix URI into components in JavaScript. +1. The mapping SHOULD be consistent across different classes of + Matrix identifiers. +1. The mapping SHOULD support linking to unfederated servers/networks + (see also + [matrix-doc#2309](https://github.com/matrix-org/matrix-doc/issues/2309) + that calls for such linking). + +The syntax and mapping discussed below meet all these requirements except +the last one that will be addressed separately. +Further extensions MUST NOT reduce the supported set of requirements. + + +### Syntax and high-level processing + +The proposed generic Matrix URI syntax is a subset of the generic +URI syntax +[defined by RFC 3986](https://tools.ietf.org/html/rfc3986#section-3): +```text +MatrixURI = "matrix:" hier-part [ "?" query ] [ "#" fragment ] +hier-part = [ "//" authority "/" ] path +``` +As mentioned above, this MSC assumes client-side URI processing +(i.e. mapping to Matrix identifiers and CS API requests). +However, even when URI processing is shifted to the server side +the client will still have to parse the URI at least to remove +the authority and fragment parts (if either exists) +before sending the request to the server (more on that below). + +#### Scheme name +The proposed scheme name is `matrix`. +[RFC 7595](https://tools.ietf.org/html/rfc7595) states: + + if there’s one-to-one correspondence between a service name and + a scheme name then the scheme name should be the same as + the service name. + +Other considered options were `mx` and `web+matrix`; +[comments to MSC455](https://github.com/matrix-org/matrix-doc/issues/455) +mention two scheme names proposed and one more has been mentioned +in `#matrix-core:matrix.org`. + +The scheme name is a definitive indication of a Matrix URI and MUST NOT +be omitted. As can be seen below, Matrix URI rely heavily on [relative +references](https://tools.ietf.org/html/rfc3986#section-4.2) and +omitting the scheme name makes them indistinguishable from a local path +that might have nothing to do with Matrix. Clients MUST NOT try to +parse pieces like `r/MyRoom:example.org` as Matrix URIs; instead, +users should be encouraged to use Matrix identifiers for in-text references +(`#MyRoom:example.org`) and client applications SHOULD turn them into +hyperlinks to Matrix URIs. + +#### Authority + +Basing on +[the definition in RFC 3986](https://tools.ietf.org/html/rfc3986#section-3.2), +this MSC restricts the authority part to never have a userinfo component, +partially to prevent confusion concerned with the `@` character that has its +own meaning in Matrix, but also because this component has historically been +a popular target of abuse. +```text +authority = host [ ":" port ] +``` +Further definition of syntax or semantics for the authority part is left for +future MSCs. Clients MUST parse the authority part as per RFC 3986 (i.e. +the presence of an authority part MUST NOT break URI parsing) but SHOULD NOT +use data from the authority part other than for experiments or research. + +The authority part may eventually be used to indicate access to a Matrix +resource (such as a room or a user) specifically through a given entity. +See "Ideas for further evolution". + +#### Path +This MSC restricts +[the very wide definition of path in RFC 3986](https://tools.ietf.org/html/rfc3986#section-3.3), +to a simple pattern that allows to easily reconstruct a Matrix identifier or +a chain of identifiers and also to locate a certain sub-resource in the scope +of a given Matrix entity: +```text +path = entity-descriptor ["/" entity-descriptor] +entity-descriptor = nonid-segment / type-qualifier id-without-sigil +nonid-segment = segment-nz ; as defined in RFC 3986, see also below +type-qualifier = segment-nz "/" ; as defined in RFC 3986, see also below +id-without-sigil = string ; as defined in Matrix identifier spec, see below +``` +The path component consists of 1 or more descriptors separated by a slash +(`/`) character. This is a generic pattern intended for reusing in future +extensions. + +This MSC only proposes mappings along `type-qualifier id-without-sigil` syntax; +`nonid-segment` is unused and reserved for future use. +For the sake of integrity future `nonid-segment` extensions must follow +[the ABNF for `segment-nz` as defined in RFC 3986](https://tools.ietf.org/html/rfc3986#appendix-A). + +This MSC defines the following `type` specifiers: `u` (user id, sigil `@`), +`r` (room alias, sigil `#`), `roomid` (room id, sigil `!`), and +`e` (event id, sigil `$`). This MSC does not define a type specifier for sigil `+` +([groups](https://github.com/matrix-org/matrix-doc/issues/1513) aka communities +or, in the more recent incarnation, +[spaces](https://github.com/matrix-org/matrix-doc/pull/1772)); a separate MSC +can introduce the specifier, along with the parsing/construction logic and +relevant CS API invocations, following the framework of this proposal. + +The following type specifiers proposed in earlier editions of this MSC and +already in use in several implementations, are deprecated: `user`, `room`, and +`event`. Client applications MAY parse these specifiers as if they were +`u`, `r`, and `e` respectively; they MUST NOT emit URIs with the deprecated +specifiers. The rationale behind the switch is laid out in "Alternatives". + +As of this MSC, `u`, `r`, and `roomid` can only be at the top +level. The type `e` (event) can only be used on the 2nd level and only under +`r` or `roomid`; this is driven by the current shape of Client-Server API +that does not provide a non-deprecated way to retrieve an event without knowing +the room (see [MSC2695](https://github.com/matrix-org/matrix-doc/pull/2695) and +[MSC2779](https://github.com/matrix-org/matrix-doc/issues/2779) that may +change this). + +Further MSCs may introduce navigation to more top-level as well as +non-top-level objects; see "Ideas for further evolution" to get inspired. These +new proposals SHOULD follow the generic grammar laid out above, adding new +`type` and `nonid-segment` specifiers and/or allowing them in other levels, +rather than introduce a new grammar. It is recommended to only use abbreviated +single-letter specifiers if they are expected to be user visible and convenient +for type-in; if a URI for a given resource type is usually generated +(e.g. because the corresponding identifier is not human-friendly), it's +RECOMMENDED to use full (though short) words to avoid ambiguity and confusion. + +`id-without-sigil` is defined as the `string` part of Matrix +[Common identifier format](https://matrix.org/docs/spec/appendices#common-identifier-format) +with percent-encoded characters that are NEITHER unreserved, sub-delimiters, `:` nor `@`, +[as per RFC 3986 rule for pchar](https://tools.ietf.org/html/rfc3986#appendix-A). +This notably exempts `:` from percent-encoding but includes `/`. + +See the rationale behind dropping sigils and the respective up/downsides in +"Discussion points and tradeoffs" as well as "Alternatives" below. + +#### Query + +Matrix URI can optionally have +[the query part](https://tools.ietf.org/html/rfc3986#section-3.4). +This MSC defines the general form for the query and two "standard" query items; +further MSCs may add to this as long as RFC 3986 is followed. +```text +query = query-element *( "&" query-item ) +query-item = action / routing / custom-query-item +action = "action=" ( "join" / "chat" ) +routing = "via=” authority +custom-query-item = custom-item-name "=" custom-item-value +custom-item-name = 1*unreserved ; reverse-DNS name; see below +custom-item-value = ; see below +``` + +The `action` query item is used in contexts where, on top of identifying +the Matrix entity, a certain action is requested on it. This proposal +describes two possible actions: +* `action=join` is only valid in a URI resolving to a Matrix room; + applications MUST ignore it if found in other contexts and MUST NOT generate + it for other Matrix resources. This action means that a client application + SHOULD attempt to join the room specified by the URI path using the standard + CS API means. +* `action=chat` is only valid in a URI resolving to a Matrix user; + applications MUST ignore it if found in other contexts and MUST NOT generate + it for other Matrix resources. This action means that a client application + SHOULD open a direct chat window with the user specified by the URI path; + clients supporting + [canonical direct chats](https://github.com/matrix-org/matrix-doc/pull/2199) + SHOULD open the canonical direct chat. + +For both actions, where applicable, client applications SHOULD ask for user +confirmation or at least notify the user before joining or creating a new room. +Conversely, no additional confirmation/notification is necessary when +the action leads to opening a room the user is already a member of. + +It is worth reiterating on the (blurry) distinction between URIs with `action` +and those without: +- a URI with no `action` simply _identifies_ the resource; if the context + implies an operation, it is usually focused on the retrieval of the resource, + in line with RFC 3986 (see also the next paragraph); +- a URI with `action` in the query means that a client application should (but + is not obliged to) perform that action, with precautions as described above. + +In some cases a client application may have no meaningful way to immediately +perform the default operation suggested by this MSC (see below); e.g., +the client may be unable to display a room before joining it, while the URI +doesn't have `action=join`. In these cases client applications are free to do +what's best for user experience (e.g., suggest joining the room), even if that +means performing an action on a URI with no `action` in the query. + +The routing query (`via=`) indicates servers that are likely involved in +the room (see also +[the feature of matrix.to](https://matrix.org/docs/spec/appendices#routing)). +In the meantime, it is proposed that this routing query be used not only with +room ids in a public federation but also when a URI refers to a resource in +a non-public Matrix network (see the question about closed federations in +"Discussion points and tradeoffs"). Note that `authority` in the definition +above is only a part of the _query parameter_ grammar; it is not proposed here +to generate or interpret the _authority part_ of the URI. + +Clients MAY introduce and recognise custom query items, according to +the following rules: +- the name of a custom item MUST follow the reverse-DNS (aka "Java package") + naming convention, as per + [MSC2758](https://github.com/matrix-org/matrix-doc/pull/2758) - e.g., + a custom action item for Element clients would be named `io.element.action`, + for Quaternion - `com.github.quaternion.action`, etc. +- the value of the item can be any content but its representation in the URI + MUST follow the general RFC requirements for the query part; on top of that, + if the raw value contains `&` it MUST be percent-encoded. +- clients SHOULD respect standard query items over their own ones; e.g., + if a URI contains both `action` and the custom client action, the standard + action should be respected as much as possible. Client authors SHOULD strive + for consistent experience across their and 3rd party clients, anticipating + that the same user may happen to have both their client and a 3rd party one. + +Client authors are strongly encouraged to standardise custom query elements +that gain adoption by submitting an MSC defining them in a way compatible +across the client ecosystem. + + +### Recommended implementation + +#### URI parsing algorithm + +The reference algorithm of parsing a Matrix URI follows. Note that, although +clients are encouraged to use lower-case strings in their URIs, all string +comparisons are case-INsensitive. + +1. Parse the URI into main components (`scheme name`, `authority`, `path`, + `query`, and `fragment`), decoding special or international characters + as directed by [RFC 3986](https://tools.ietf.org/html/rfc3986) and + (for IRIs) [RFC 3987](https://tools.ietf.org/html/rfc3987). Authors are + strongly RECOMMENDED that they find an existing implementation of that step + for their language and SDK, rather than implement it from scratch based + on RFCs. + +1. Check that `scheme name` is exactly `matrix`, case-insensitive. If + the scheme name doesn't match, exit parsing: this is not a Matrix URI. + +1. Split the `path` into segments separated by `/` character; several + subsequent `/` characters delimit empty segments, as advised by RFC 3986. + +1. Check that the URI contains either 2 or 4 segments; if it's not the case, + fail parsing; the Matrix URI is invalid. + +1. To construct the top-level (primary) Matrix identifier: + + a. Pick the leftmost segment of `path` until `/` (path segment) and match + it against the following list to produce `sigil-1`: + - `u` (or, optionally, `user` - see "Path") -> `@` + - `r` (or, optionally, `room`) -> `#` + - `roomid` -> `!` + - any other string, including an empty one -> fail parsing: + the Matrix URI is invalid. + + b. Pick the next (2nd) leftmost path segment: + - if the segment is empty, fail parsing; + - otherwise, percent-decode the segment (unless the initial URI parse + has already done that) and make `mxid-1` by prepending `sigil-1`. + +1. If `sigil-1` is `!` or `#` and the URI path has exactly 4 segments, + it may be possible to construct the 2nd-level Matrix identifier to + point to an event inside the room identified by `mxid-1`: + + a. Pick the next (3rd) path segment: + - if the segment is exactly `e` (or, optionally, `event`), proceed; + - otherwise, including the case of an empty segment (trailing `/`, e.g.), + fail parsing. + + b. Pick the next (4th) leftmost path segment: + - if the segment is empty, fail parsing; + - otherwise, percent-decode the segment (unless the initial URI parse + has already done that) and make `mxid-2` by prepending `$`. + +1. Split the `query` into items separated by `&` character; several subsequent + `&` characters delimit empty items, ignored by this algorithm. + + a. If `query` contains one or more items starting with `via=`: for each item, treat + the rest of the item as a percent-encoded homeserver name to be used in + [routing](https://matrix.org/docs/spec/appendices#routing). + + b. If `query` contains one or more items starting with `action=`: treat + _the last_ such item as an instruction, as this proposal defines in [query](#query). + +Clients MUST implement proper percent-decoding of the identifiers; there's no +liberty similar to that of matrix.to. + +#### Operations on Matrix URIs + +The main purpose of a Matrix URI is accessing the resource specified by the +identifier. This MSC defines the "default" operation +([in the sense of RFC 7595](https://tools.ietf.org/html/rfc7595#section-3.4)) +that a client application SHOULD perform when the user activates +(e.g. clicks on) a URI; further MSCs may introduce additional operations +enabled either by passing an `action` value in the query part, or by other +means. + +The classes of URIs and corresponding default operations (along with relevant +CS API calls) are collected below. The table assumes that the operations are +performed on behalf (using the access token) of the user `@me:example.org`: + +| URI class/example | Interactive operation | Non-interactive operation / Involved CS API | +| ----------------- | --------------------- | --------------------------------------------- | +| User Id (no `action` in URI):
`matrix:u/her:example.org` | _Outside the room context_: show user profile
_Inside the room context:_ mention the user in the current room (client-local operation) | No default non-interactive operation
`GET /profile/@her:example.org/display_name`
`GET /profile/@her:example.org/avatar_url` | +| User Id (`action=chat`):
`matrix:u/her:example.org?action=chat` | 1. Confirm with the local user if needed (see "Query")
2. Open the room as defined in the next column | If [canonical direct chats](https://github.com/matrix-org/matrix-doc/pull/2199) are supported: `GET /_matrix/client/r0/user/@me:example.org/dm?involves=@her:example.org`
Without canonical direct chats:
1. `GET /user/@me:example.org/account_data/m.direct`
2. Find the room id for `@her:example.org` in the event content
3. if found, return this room id; if not, `POST /createRoom` with `"is_direct": true` and return id of the created room | +| Room (no `action` in URI):
`matrix:roomid/rid:example.org`
`matrix:r/us:example.org` | Attempt to "open" (usually: display the timeline at the latest or last remembered position) the room | No default non-interactive operation
API: Find the respective room in the local `/sync` cache or
`GET /rooms/!rid:example.org/...`
| +| Room (`action=join`):
`matrix:roomid/rid:example.org?action=join&via=example2.org`
`matrix:r/us:example.org?action=join` | 1. Confirm with the local user if needed (see "Query")
2. Attempt to join the room | `POST /join/!rid:example.org?server_name=example2.org`
`POST /join/#us:example.org` | +| Event:
`matrix:r/us:example.org/e/lol823y4bcp3qo4`
`matrix:roomid/rid:example.org/event/lol823y4bcp3qo4?via=example2.org` | 1. For room aliases, resolve an alias to a room id (see the next column)
2. Attempt to retrieve (see the next column) and display the event;
3. If the event could not be retrieved due to access denial and the current user is not a member of the room, the client MAY offer the user to join the room and try to open the event again | Non-interactive operation: return event or event content, depending on context
API: find the event in the local `/sync` cache or
`GET /directory/room/%23us:example.org` (to resolve alias to id)
`GET /rooms/!rid:example.org/event/lol823y4bcp3qo4?server_name=example2.org`
| + + +#### URI construction algorithm + +The following algorithm assumes a Matrix identifier that follows +the high-level grammar described in the specification. Clients MUST ensure +compliance of identifiers passed to this algorithm. + +For room and user identifiers (including room aliases): +1. Remove the sigil character from the identifier and match it against + the following list to produce `prefix-1`: + - `@` -> `u/` + - `#` -> `r/` + - `!` -> `roomid/` +2. Build the Matrix URI as a concatenation of: + - literal `matrix:`; + - `prefix-1`; + - the remainder of identifier (`id without sigil`), percent-encoded as per + [RFC 3986](https://tools.ietf.org/html/rfc3986). + +For event identifiers (assuming they need the room context, see +[MSC2695](https://github.com/matrix-org/matrix-doc/pull/2695) and +[MSC2779](https://github.com/matrix-org/matrix-doc/issues/2779) that +may change this): +1. Take the event's room id or canonical alias and build a Matrix URI for them + as described above. +2. Append to the result of previous step: + - literal `e/`; + - the event id after removing the sigil (`$`) and percent-encoding. + +Clients MUST implement proper percent-encoding of the identifiers; there's no +liberty similar to that of matrix.to. + + +## Discussion and non-normative statements + +### Ideas for further evolution + +This MSC is obviously just the first step, keeping the door open for +extensions. Here are a few ideas: + +* Add new actions; e.g. leaving a room (`action=leave`). + +* Add specifying a segment of the room timeline (`from=$evtid1&to=$evtid2`). + +* Unlock bare event ids (`matrix:e/$event_id`) - subject to change in + other areas of the specification. + +* Bring tangible semantics to the authority part. The main purpose of + the authority part, + [as per RFC 3986](https://tools.ietf.org/html/rfc3986#section-3.2), + is to identify the entity governing the namespace for the rest of the URI. + The current MSC rules out the userinfo component but leaves it to a separate + MSC to define semantics of the remaining`host[:port]` piece. + + Importantly, future MSCs are advised against using the authority part for + _routing over federation_ (the case for `via=` query items), as it would be + against the spirit of RFC 3986. The authority part can be used in cases when + a given Matrix entity is only available from certain servers (the case of + closed federations or non-federating servers). + + While being a part of the original proposal in an attempt to address + [the respective case](https://github.com/matrix-org/matrix-doc/issues/2309), + the definition of the authority semantics has been dropped as a result of + [the subsequent discussion](https://github.com/matrix-org/matrix-doc/pull/2312#discussion_r348960282). + A further MSC may approach the same case (and/or others) and define the + meaning of the authority part (either on the client- or even on + the server-side - provided that using Matrix URIs on the server-side brings + some other value along the way). This might not necessarily be actual DNS + hostnames even - one (quite far-fetched for now) idea to entertain might be + introducing some decentralised system of "network names" in order to equalise + "public" and "non-public" federations. + + Along the same lines, if providing any part of user credentials via + the authority part is found to be of considerable value in some case, + a separate MSC could both reinstate it in the grammar and define how + to construct, parse, and use it - provided that the same MSC addresses + the security concerns associated with such URIs. + +* One could conceive a URI mapping of avatars in the form of + `matrix:u/uid:matrix.org/avatar/room:matrix.org` + (a user’s avatar for a given room). + +* As described in "Alternatives", a synonymous system can be introduced that + uses Matrix identifiers with sigils by adding another path prefix (e.g., + `matrix:id/%23matrix:matrix.org`). However, such MSC would have to address + the concerns of possible confusion arising from having two similar but + distinct notations. + +* Interoperability of Matrix URIs with + [Linked Data](https://en.wikipedia.org/wiki/Linked_data). + + +### Past discussion points and tradeoffs + +The below documents the discussion and outcomes in various prior forums; +further discussion should happen in GitHub comments. +1. _Why no double-slashes in a typical URI?_ + Because `//` is used to mark the beginning of an authority + part. RFC 3986 explicitly forbids to start the path component with + `//` if the URI doesn't have an authority component. In other words, + `//` implies a centre of authority, and the (public) Matrix + federation is not supposed to have one; hence no `//` in most URIs. +1. ~~_Why do type specifiers use singular rather than plural + as is common in RESTful APIs?_~~ + This is no more relevant with single-letter type specifiers. The answer + below is provided for history only. + Unlike in actual RESTful APIs, this MSC does not see `rooms/` or + `users/` as collections to browse. The type specifier completes + the id specification in the URI, defining a very specific and + easy to parse syntax for that. Future MSCs may certainly add + collection URIs, but it is recommended to use more distinct naming + for such collections. In particular, `rooms/` is ambiguous, as + different sets of rooms are available to any user at any time + (e.g., all rooms known to the user; or all routable rooms; or + public rooms known to the user's homeserver). +1. _Should we advise using the query part for collections then?_ + Not in this MSC but that can be considered in the future. +1. _Why can't event URIs use the fragment part for the event ID?_ + Because fragment is a part processed exclusively by the client + in order to navigate within a larger document, and room cannot + be considered a "document". Each event can be retrieved from the server + individually, so each event can be viewed as a self-contained document. + When/if URI processing is shifted to the server-side, servers are not even + going to receive fragments (as per RFC 3986), which is why usage of + fragments to remove the need for percent-encoding in other identifiers + would lead to URIs that cannot be resolved on servers. Effectively, all + clients would have to implement full URI processing with no chance + to offload that to the server. For that reason fragments, if/when ever + employed in Matrix, only should be used to pinpoint a position within events + and for similar strictly client-side operations. +1. _How does this MSC work with closed federations?_ ~~If you need to + communicate a URI to the bigger world where you cannot expect + the consumer to know in advance which federation they should use - + supply any server of the closed federation in the authority part. + Users inside the closed federation can omit the authority part if + they know the URI is not going to be used outside this federation. + Clients can facilitate that by having an option to always add or omit + the authority part in generated URIs for a given user account.~~ + As of now, use `via=` in order to point to a homeserver in the closed + federation. The authority part may eventually be used for that (or for some + other case - see the previous section). + + +### Alternatives + +#### Using full words for all types + +During its draft state, this MSC was proposing type specifiers using full words +(`user`, `room`, `event` etc.), arguing that abbreviations can be introduced +separately as synonyms. Full words have several shortcomings pointed out in +discussions across the whole period of preparation, namely: +- The singular vs. plural choice (see also "Past discussion points") +- Using English words raises a question about eventual support of localised + URI variants (`matrix:benutzer/...`, `matrix:usuario/...` etc.) catering to + international audience, that would add complication to the Matrix technology. +- Abbreviated forms are popularised by Reddit and make URIs shorter which is + crucial for the outbound integration case (see the introduction). + +Meanwhile, using `u`/`r`/`e` for users, rooms and events has the following +advantages: +1. there's a strong Reddit legacy, with users across the world quite familiar + with the abbreviated forms (and `r/` coincidentally standing for sub-Reddits + links to which have basically the same place in the Reddit ecosystem as + Matrix room aliases have in the Matrix ecosystem); +2. matrix.to links to users and room aliases are heavily used throughout Matrix, + specifically in end-user-facing contexts (see also use cases in the + introductory section of this MSC); +3. the singular vs. plural (`room` or `rooms`?) confusion is avoided; +4. it's shorter, which is crucial for typing the URI in an external medium. + +The rationale behind not abbreviating `roomid/` is a better distinction between +room aliases and room ids; also, since room ids are almost never typed in +manually, the advantages (3) and (4) above don't hold. + +For these reasons, it was decided in the end to use the single-letter style +for types most used in the outbound integration case. It's still possible to +reinstate full words as synonyms some time down the road, with the caveat that +a canonicalisation service from homeservers may be needed to avoid having +to enable synonyms at each client individually. + +#### URNs + +The discussion in +[MSC455](https://github.com/matrix-org/matrix-doc/issues/455) +mentions an option to standardise URNs rather than URLs/URIs, +with the list of resolvers being user-specific. While a URN namespace +such as `urn:matrix:`, along with a URN scheme, might be deemed useful +once we shift to (even) more decentralised structure of the network, +`urn:` URIs must be managed entities (see +[RFC 8141](https://tools.ietf.org/html/rfc8141)) which is not always +the case in Matrix (consider room aliases, e.g.). + +With that said, a URN-styled (`matrix:room:example.org:roomalias`) +option was considered. However, Matrix already uses colon (`:`) as +a delimiter of id parts and, as can be seen above, reversing the parts +to meet the URN's hierarchical order would look confusing for Matrix +users (as in example above - is `room` a part of the identifier or +the type signifier?). + +#### "Full REST" + +Yet another alternative considered was to go "full REST" and structure +URLs in a more traditional way with serverparts coming first, followed +by type grouping (sic - not specifiers), and then by localparts, +i.e. `matrix://example.org/rooms/roomalias`. This is even more difficult +to comprehend for a Matrix user than the previous alternative and besides it +conflates the notion of an authority server with that of a namespace +discriminator: clients would not connect to `example.org` to resolve the alias +above, they would still connect to their own homeserver. + +#### Minimal syntax + +One early proposal was to simply prepend `matrix:` to a Matrix identifier +(without encoding it), assuming that it will only be processed on the client +side. The massive downside of this option is that such strings are not actual +URIs even though they look like ones: most URI parsers won't handle them +correctly. As laid out in the beginning of this proposal, Matrix URIs are +not striving to preempt Matrix identifiers; instead of trying to produce +an equally readable string, one should just use identifiers where they work. +Why Matrix identifiers look the way they look is way out of the MSC scope +to discuss here. + +#### Minimal syntax based on the path component and percent-encoding + +A simple modification of the previous option is much more viable: +proper percent-encoding of the Matrix identifier allows to use it as +a URI path part. A single identifier packed in a URI could look like +`matrix:/encoded_id_with_sigil`; an event-in-a-room URI would be something +like `matrix:/roomid_or_alias/$event_id` (NB: RFC 3986 doesn't require `$` +to be encoded). This is considerably more concise and encoding is only +needed for `#`. + +Quite unfortunately, `#` is one of the two sigils in Matrix most relevant +to integration cases. The other one is `@`; it doesn't need encoding except +in the authority part - which is why the form above uses a leading `/` that +puts the identifier in the path part instead of what parsers treat as +the authority part. `#` has to be encoded wherever it appears, making a URI +for Matrix HQ, the first chat room many new users join, look like +`matrix:/%23matrix:matrix.org`. Beyond first-time usage, this generally impacts +[the "napkin" case](https://tools.ietf.org/html/rfc3986#section-1.2.1) from +RFC 3986 that the Requirements section of this MSC mentions. Until we have +applications generally recognising Matrix identifiers in the same way e-mail +addresses are recognised without prefixing `mailto:`, we should live with +the fact that people will have to produce Matrix URIs by hand in various +instances, from pen-and-paper to other instant messengers. + +Putting the whole id to the URI fragment (`matrix:#id_with_sigil` or, +following on the `matrix.to` tradition, `matrix:#/id_with_sigil` for +readability) allows using `#` without encoding on many URI parsers. It is +still not fully RFC-compliant and rules out using URIs by homeservers +(see also "Past discussion points" on using fragments to address events). + +Regardless of the placement (the fragment or the path), one more consideration +is that the character space for sigils is extremely limited and +Matrix identifiers are generally less expressive than full-blown URI paths. +Not that Matrix showed a tendency to produce many classes of objects that would +warrant a dedicated sigil but that cannot be ruled out. Rather than rely +on the institute of sigils, this proposal gives an alternative more +extensible syntax that can be used for more advanced cases - as a uniform way +to represent arbitrary sub-objects (with or without Matrix identifier) such as +user profiles, or a notifications feed for the room - and also, if ever needed, +as an escape hatch to a bigger namespace if we hit shortage of sigils. + +The current proposal is also flexible enough to incorporate the minimal +syntax of this option as an alternative to its own notation - e.g., a further +MSC could enable `matrix:id/%23matrix:matrix.org` as a synonym for +`matrix:room/matrix:matrix.org`. + + +## Potential issues + +Despite the limited functionality of URIs as proposed in this MSC, +Matrix authors are advised to use tools that would process URIs just +like an HTTP(S) URI instead of making home-baked parsers/emitters. +Even with that in mind, not all tools normalise and sanitise all cases +in a fully RFC-compliant way. This MSC tries to keep the required +transformations to the minimum and will likely not bring much grief even +with naive implementations; however, as functionality of Matrix URI grows, +the number of corner cases will increase. + + +## Security/privacy considerations + +This MSC mostly builds on RFC 3986 but tries to reduce the scope +as much as possible. Notably, it avoids introducing complex traversable +structures and further restricts the URI grammar to the necessary subset. +In particular, dot path segments (`.` and `..`), while potentially useful +when URIs become richer, would come too much ahead of time for now. Care +is taken to not make essential parts of the URI omittable to avoid +even accidental misrepresentation of a local resource for a remote one +in Matrix and vice versa. + +As mentioned in the authority part section, the MSC intentionally doesn't +support conveying any kind of user information in URIs. + +The MSC strives to not be prescriptive in treating URIs except the `action` +query parameter. Actions without user confirmation may lead to unintended +leaks of certain metadata and/or changes in the account state with respect +to Matrix. To reiterate, clients SHOULD ask for a user consent if/when they +can unless applying the action doesn't lead to sending persistent (message +or state) events on user's behalf. + + +## Conclusion + +A dedicated URI scheme is well overdue for Matrix. Many other networks +already have got one for themselves, benefiting both in terms of +branding (compare `matrix:r/weruletheworld:example.org` vs. +`#weruletheworld:example.org` from the standpoint of someone who +hasn't been to Matrix) and interoperability (`matrix.to` requires +opening a browser while clicking a `tg:` link dumped to the terminal +application will open the correct application for Telegram without +user intervention or can even offer to install one, if needed). +The proposed syntax makes conversion between Matrix URIs +and Matrix identifiers as easy as a bunch of string comparisons or +regular expressions; so even though client-side processing of URIs +might not be optimal longer-term, it's a very simple and quick way +that allows plenty of experimentation early on.