diff --git a/proposals/1708-well-known-for-federation.md b/proposals/1708-well-known-for-federation.md index 8fa66c7a9..6e857b854 100644 --- a/proposals/1708-well-known-for-federation.md +++ b/proposals/1708-well-known-for-federation.md @@ -1,21 +1,17 @@ # MSC1708: .well-known support for server name resolution Currently, mapping from a server name to a hostname for federation is done via -`SRV` records. This presents two principal difficulties: - - * SRV records are not widely used, and administrators may be unfamiliar with - them, and there may be other practical difficulties in their deployment such - as poor support from hosting providers. [^1] - - * [MSC1711](https://github.com/matrix-org/matrix-doc/pull/1711) proposes - requiring valid X.509 certificates on the - federation endpoint. It will then be necessary for the homeserver to present - a certificate which is valid for the server name. This presents difficulties - for hosted server offerings: BigCorp may be reluctant to hand over the - keys for `bigcorp.com` to the administrators of the `bigcorp.com` matrix - homeserver. - -Here we propose to solve these problems by augmenting the current `SRV` record +`SRV` records. However, +[MSC1711](https://github.com/matrix-org/matrix-doc/pull/1711) proposes +requiring valid X.509 certificates on the federation endpoint. It will then be +necessary for the homeserver to present a certificate which is valid for the +server name. This presents difficulties for hosted server offerings: BigCorp +may want to delegate responsibility for running its Matrix homeserver to an +outside supplier, but it may be difficult for that supplier to obtain a TLS +certificate for `bigcorp.com` (and BigCorp may be reluctant to let them have +one). + +This MSC proposes to solve this problem by augmenting the current `SRV` record with a `.well-known` lookup. ## Proposal @@ -24,59 +20,80 @@ For reference, the current [specification for resolving server names](https://matrix.org/docs/spec/server_server/unstable.html#resolving-server-names) is as follows: -* If the hostname is an IP literal, then that IP address should be used, - together with the given port number, or 8448 if no port is given. +1. If the hostname is an IP literal, then that IP address should be used, + together with the given port number, or 8448 if no port is given. + +2. Otherwise, if the port is present, then an IP address is discovered by + looking up an AAAA or A record for the hostname, and the specified port is + used. + +3. If the hostname is not an IP literal and no port is given, the server is + discovered by first looking up a `_matrix._tcp` SRV record for the + hostname, which may give a hostname (to be looked up using AAAA or A queries) + and port. + +4. Finally, the server is discovered by looking up an AAAA or A record on the + hostname, and taking the default fallback port number of 8448. + +We insert the following between Steps 3 and 4: + +If the SRV record does not exist, the requesting server should make a `GET` +request to `https:///.well-known/matrix/server`, with normal +X.509 certificate validation. If the request does not return a 200, continue +to step 4, otherwise: + +XXX: should we follow redirects? -* Otherwise, if the port is present, then an IP address is discovered by - looking up an AAAA or A record for the hostname, and the specified port is - used. +The response must have a `Content-Type` of `application/json`, and must be +valid JSON which follows the structure documented below. Otherwise, the +request is aborted. -* If the hostname is not an IP literal and no port is given, the server is - discovered by first looking up a `_matrix._tcp` SRV record for the - hostname, which may give a hostname (to be looked up using AAAA or A queries) - and port. If the SRV record does not exist, then the server is discovered by - looking up an AAAA or A record on the hostname and taking the default - fallback port number of 8448. +If the response is valid, the `m.server` property is parsed as +`[:]`, and processed as follows: - Homeservers may use SRV records to load balance requests between multiple TLS - endpoints or to failover to another endpoint if an endpoint fails. + a. If `` is an IP literal, then that IP address should + be used, together with ``, or 8448 if no port is + given. The server should present a valid TLS certificate for + ``. -The first two points remain unchanged: if the server name is an IP literal, or -contains a port, then requests will be made directly as before. + b. Otherwise, if the port is present, then an IP address is discovered by + looking up an AAAA or A record for ``, and the + specified port is used. The server should present a valid TLS certificate + for ``. -If the hostname is neither an IP literal, nor does it have an explicit port, -then the requesting server should continue to make an SRV lookup as before, and -use the result if one is found. + (In other words, the federation connection is made to + `https://:`). -If *no* SRV result is found, the requesting server should make a `GET` request -to `https://\/.well-known/matrix/server`, with normal X.509 -certificate validation. If the request fails in any way, then we fall back as -before to using using port 8448 on the hostname. + c. If the hostname is not an IP literal and no port is given, a second SRV + record is looked up; this time for `_matrix._tcp.`, + which may give yet another hostname (to be looked up using A/AAAA queries) + and port. The server must present a TLS cert for the + `` from the .well-known. -Rationale: Falling back to port 8448 (rather than aborting the request) is -necessary to maintain compatibility with existing deployments, which may not -present valid certificates on port 443, or may return 4xx or 5xx errors. + d. If no SRV record is found, the server is discovered by looking up an AAAA + or A record on ``, and taking the default fallback + port number of 8448. -If the GET request succeeds, it should result in a JSON response, with contents -structured as shown: + (In other words, the federation connection is made to + `https://:8448`). + +### Structure of the `.well-known` response + +The contents of the `.well-known` response should be structured as shown: ```json { - "server": "[:]" + "m.server": "[:]" } ``` -The `server` property should be a hostname or IP address, followed by an +The `m.server` property should be a hostname or IP address, followed by an optional port. If the response cannot be parsed as JSON, or lacks a valid `server` property, the request is considered to have failed, and no fallback to port 8448 takes place. -Otherwise, the requesting server performs an `AAAA/A` lookup on the hostname -(if necessary), and connects to the resultant address and the specifed -port. The port defaults to 8448, if unspecified. - (The formal grammar for the `server` property is identical to that of a [server name](https://matrix.org/docs/spec/appendices.html#server-name).) @@ -92,18 +109,10 @@ sensible default: 24 hours is suggested. Because there is no way to request a revalidation, it is also recommended that requesting servers cap the expiry time. 48 hours is suggested. -Similarly, a failure to retrieve the `.well-known` file should be cached for -a reasonable period. 24 hours is suggested again. - -### The future of SRV records - -It's worth noting that this proposal is very clear in that we will maintain -support for SRV records for the immediate future; there are no current plans to -deprecate them. - -However, clearly a `.well-known` file can provide much of the functionality of -an SRV record, and having to support both may be undesirable. Accordingly, we -may consider sunsetting SRV record support at some point in the future. +A failure to retrieve the `.well-known` file should also be cached, though care +must be taken that a single 500 error or connection failure should not break +federation for an extended period. A short cache time of about an hour might be +appropriate; alternatively, servers might use an exponential backoff. ### Outstanding questions @@ -127,7 +136,6 @@ as soon as possible, to maximise uptake in the ecosystem. It is likely that, as we approach Matrix 1.0, there will be sufficient other new features (such as new Room versions) that upgrading will be necessary anyway. - ## Security considerations The `.well-known` file potentially broadens the attack surface for an attacker @@ -138,6 +146,3 @@ wishing to intercept federation traffic to a particular server. This proposal adds a new mechanism, alongside the existing `SRV` record lookup for finding the server responsible for a particular matrix server_name, which will allow greater flexibility in deploying homeservers. - - -[^1] For example, Cloudflare automatically "flattens" SRV record responses.