Do a SRV lookup before .well-known lookup

also other clarifications and corrections.
pull/977/head
Richard van der Hoff 7 years ago
parent e789eb186a
commit f33a540e6d

@ -1,21 +1,17 @@
# MSC1708: .well-known support for server name resolution # MSC1708: .well-known support for server name resolution
Currently, mapping from a server name to a hostname for federation is done via Currently, mapping from a server name to a hostname for federation is done via
`SRV` records. This presents two principal difficulties: `SRV` records. However,
[MSC1711](https://github.com/matrix-org/matrix-doc/pull/1711) proposes
* SRV records are not widely used, and administrators may be unfamiliar with requiring valid X.509 certificates on the federation endpoint. It will then be
them, and there may be other practical difficulties in their deployment such necessary for the homeserver to present a certificate which is valid for the
as poor support from hosting providers. [^1] server name. This presents difficulties for hosted server offerings: BigCorp
may want to delegate responsibility for running its Matrix homeserver to an
* [MSC1711](https://github.com/matrix-org/matrix-doc/pull/1711) proposes outside supplier, but it may be difficult for that supplier to obtain a TLS
requiring valid X.509 certificates on the certificate for `bigcorp.com` (and BigCorp may be reluctant to let them have
federation endpoint. It will then be necessary for the homeserver to present one).
a certificate which is valid for the server name. This presents difficulties
for hosted server offerings: BigCorp may be reluctant to hand over the This MSC proposes to solve this problem by augmenting the current `SRV` record
keys for `bigcorp.com` to the administrators of the `bigcorp.com` matrix
homeserver.
Here we propose to solve these problems by augmenting the current `SRV` record
with a `.well-known` lookup. with a `.well-known` lookup.
## Proposal ## Proposal
@ -24,59 +20,80 @@ For reference, the current [specification for resolving server
names](https://matrix.org/docs/spec/server_server/unstable.html#resolving-server-names) names](https://matrix.org/docs/spec/server_server/unstable.html#resolving-server-names)
is as follows: is as follows:
* If the hostname is an IP literal, then that IP address should be used, 1. If the hostname is an IP literal, then that IP address should be used,
together with the given port number, or 8448 if no port is given. together with the given port number, or 8448 if no port is given.
2. Otherwise, if the port is present, then an IP address is discovered by
looking up an AAAA or A record for the hostname, and the specified port is
used.
3. If the hostname is not an IP literal and no port is given, the server is
discovered by first looking up a `_matrix._tcp` SRV record for the
hostname, which may give a hostname (to be looked up using AAAA or A queries)
and port.
4. Finally, the server is discovered by looking up an AAAA or A record on the
hostname, and taking the default fallback port number of 8448.
We insert the following between Steps 3 and 4:
If the SRV record does not exist, the requesting server should make a `GET`
request to `https://<server_name>/.well-known/matrix/server`, with normal
X.509 certificate validation. If the request does not return a 200, continue
to step 4, otherwise:
XXX: should we follow redirects?
* Otherwise, if the port is present, then an IP address is discovered by The response must have a `Content-Type` of `application/json`, and must be
looking up an AAAA or A record for the hostname, and the specified port is valid JSON which follows the structure documented below. Otherwise, the
used. request is aborted.
* If the hostname is not an IP literal and no port is given, the server is If the response is valid, the `m.server` property is parsed as
discovered by first looking up a `_matrix._tcp` SRV record for the `<delegated_server_name>[:<delegated_port>]`, and processed as follows:
hostname, which may give a hostname (to be looked up using AAAA or A queries)
and port. If the SRV record does not exist, then the server is discovered by
looking up an AAAA or A record on the hostname and taking the default
fallback port number of 8448.
Homeservers may use SRV records to load balance requests between multiple TLS a. If `<delegated_server_name>` is an IP literal, then that IP address should
endpoints or to failover to another endpoint if an endpoint fails. be used, together with `<delegated_port>`, or 8448 if no port is
given. The server should present a valid TLS certificate for
`<delegated_server_name>`.
The first two points remain unchanged: if the server name is an IP literal, or b. Otherwise, if the port is present, then an IP address is discovered by
contains a port, then requests will be made directly as before. looking up an AAAA or A record for `<delegated_server_name>`, and the
specified port is used. The server should present a valid TLS certificate
for `<delegated_server_name>`.
If the hostname is neither an IP literal, nor does it have an explicit port, (In other words, the federation connection is made to
then the requesting server should continue to make an SRV lookup as before, and `https://<delegated_server_name>:<delegated_port>`).
use the result if one is found.
If *no* SRV result is found, the requesting server should make a `GET` request c. If the hostname is not an IP literal and no port is given, a second SRV
to `https://\<server_name>/.well-known/matrix/server`, with normal X.509 record is looked up; this time for `_matrix._tcp.<delegated_server_name>`,
certificate validation. If the request fails in any way, then we fall back as which may give yet another hostname (to be looked up using A/AAAA queries)
before to using using port 8448 on the hostname. and port. The server must present a TLS cert for the
`<delegated_server_name>` from the .well-known.
Rationale: Falling back to port 8448 (rather than aborting the request) is d. If no SRV record is found, the server is discovered by looking up an AAAA
necessary to maintain compatibility with existing deployments, which may not or A record on `<delegated_server_name>`, and taking the default fallback
present valid certificates on port 443, or may return 4xx or 5xx errors. port number of 8448.
If the GET request succeeds, it should result in a JSON response, with contents (In other words, the federation connection is made to
structured as shown: `https://<delegated_server_name>:8448`).
### Structure of the `.well-known` response
The contents of the `.well-known` response should be structured as shown:
```json ```json
{ {
"server": "<server>[:<port>]" "m.server": "<server>[:<port>]"
} }
``` ```
The `server` property should be a hostname or IP address, followed by an The `m.server` property should be a hostname or IP address, followed by an
optional port. optional port.
If the response cannot be parsed as JSON, or lacks a valid `server` property, If the response cannot be parsed as JSON, or lacks a valid `server` property,
the request is considered to have failed, and no fallback to port 8448 takes the request is considered to have failed, and no fallback to port 8448 takes
place. place.
Otherwise, the requesting server performs an `AAAA/A` lookup on the hostname
(if necessary), and connects to the resultant address and the specifed
port. The port defaults to 8448, if unspecified.
(The formal grammar for the `server` property is identical to that of a [server (The formal grammar for the `server` property is identical to that of a [server
name](https://matrix.org/docs/spec/appendices.html#server-name).) name](https://matrix.org/docs/spec/appendices.html#server-name).)
@ -92,18 +109,10 @@ sensible default: 24 hours is suggested.
Because there is no way to request a revalidation, it is also recommended that Because there is no way to request a revalidation, it is also recommended that
requesting servers cap the expiry time. 48 hours is suggested. requesting servers cap the expiry time. 48 hours is suggested.
Similarly, a failure to retrieve the `.well-known` file should be cached for A failure to retrieve the `.well-known` file should also be cached, though care
a reasonable period. 24 hours is suggested again. must be taken that a single 500 error or connection failure should not break
federation for an extended period. A short cache time of about an hour might be
### The future of SRV records appropriate; alternatively, servers might use an exponential backoff.
It's worth noting that this proposal is very clear in that we will maintain
support for SRV records for the immediate future; there are no current plans to
deprecate them.
However, clearly a `.well-known` file can provide much of the functionality of
an SRV record, and having to support both may be undesirable. Accordingly, we
may consider sunsetting SRV record support at some point in the future.
### Outstanding questions ### Outstanding questions
@ -127,7 +136,6 @@ as soon as possible, to maximise uptake in the ecosystem. It is likely that, as
we approach Matrix 1.0, there will be sufficient other new features (such as we approach Matrix 1.0, there will be sufficient other new features (such as
new Room versions) that upgrading will be necessary anyway. new Room versions) that upgrading will be necessary anyway.
## Security considerations ## Security considerations
The `.well-known` file potentially broadens the attack surface for an attacker The `.well-known` file potentially broadens the attack surface for an attacker
@ -138,6 +146,3 @@ wishing to intercept federation traffic to a particular server.
This proposal adds a new mechanism, alongside the existing `SRV` record lookup This proposal adds a new mechanism, alongside the existing `SRV` record lookup
for finding the server responsible for a particular matrix server_name, which for finding the server responsible for a particular matrix server_name, which
will allow greater flexibility in deploying homeservers. will allow greater flexibility in deploying homeservers.
[^1] For example, Cloudflare automatically "flattens" SRV record responses.

Loading…
Cancel
Save