|
|
|
@ -1,21 +1,17 @@
|
|
|
|
|
# MSC1708: .well-known support for server name resolution
|
|
|
|
|
|
|
|
|
|
Currently, mapping from a server name to a hostname for federation is done via
|
|
|
|
|
`SRV` records. This presents two principal difficulties:
|
|
|
|
|
|
|
|
|
|
* SRV records are not widely used, and administrators may be unfamiliar with
|
|
|
|
|
them, and there may be other practical difficulties in their deployment such
|
|
|
|
|
as poor support from hosting providers. [^1]
|
|
|
|
|
|
|
|
|
|
* [MSC1711](https://github.com/matrix-org/matrix-doc/pull/1711) proposes
|
|
|
|
|
requiring valid X.509 certificates on the
|
|
|
|
|
federation endpoint. It will then be necessary for the homeserver to present
|
|
|
|
|
a certificate which is valid for the server name. This presents difficulties
|
|
|
|
|
for hosted server offerings: BigCorp may be reluctant to hand over the
|
|
|
|
|
keys for `bigcorp.com` to the administrators of the `bigcorp.com` matrix
|
|
|
|
|
homeserver.
|
|
|
|
|
|
|
|
|
|
Here we propose to solve these problems by augmenting the current `SRV` record
|
|
|
|
|
`SRV` records. However,
|
|
|
|
|
[MSC1711](https://github.com/matrix-org/matrix-doc/pull/1711) proposes
|
|
|
|
|
requiring valid X.509 certificates on the federation endpoint. It will then be
|
|
|
|
|
necessary for the homeserver to present a certificate which is valid for the
|
|
|
|
|
server name. This presents difficulties for hosted server offerings: BigCorp
|
|
|
|
|
may want to delegate responsibility for running its Matrix homeserver to an
|
|
|
|
|
outside supplier, but it may be difficult for that supplier to obtain a TLS
|
|
|
|
|
certificate for `bigcorp.com` (and BigCorp may be reluctant to let them have
|
|
|
|
|
one).
|
|
|
|
|
|
|
|
|
|
This MSC proposes to solve this problem by augmenting the current `SRV` record
|
|
|
|
|
with a `.well-known` lookup.
|
|
|
|
|
|
|
|
|
|
## Proposal
|
|
|
|
@ -24,59 +20,80 @@ For reference, the current [specification for resolving server
|
|
|
|
|
names](https://matrix.org/docs/spec/server_server/unstable.html#resolving-server-names)
|
|
|
|
|
is as follows:
|
|
|
|
|
|
|
|
|
|
* If the hostname is an IP literal, then that IP address should be used,
|
|
|
|
|
together with the given port number, or 8448 if no port is given.
|
|
|
|
|
1. If the hostname is an IP literal, then that IP address should be used,
|
|
|
|
|
together with the given port number, or 8448 if no port is given.
|
|
|
|
|
|
|
|
|
|
2. Otherwise, if the port is present, then an IP address is discovered by
|
|
|
|
|
looking up an AAAA or A record for the hostname, and the specified port is
|
|
|
|
|
used.
|
|
|
|
|
|
|
|
|
|
3. If the hostname is not an IP literal and no port is given, the server is
|
|
|
|
|
discovered by first looking up a `_matrix._tcp` SRV record for the
|
|
|
|
|
hostname, which may give a hostname (to be looked up using AAAA or A queries)
|
|
|
|
|
and port.
|
|
|
|
|
|
|
|
|
|
4. Finally, the server is discovered by looking up an AAAA or A record on the
|
|
|
|
|
hostname, and taking the default fallback port number of 8448.
|
|
|
|
|
|
|
|
|
|
We insert the following between Steps 3 and 4:
|
|
|
|
|
|
|
|
|
|
If the SRV record does not exist, the requesting server should make a `GET`
|
|
|
|
|
request to `https://<server_name>/.well-known/matrix/server`, with normal
|
|
|
|
|
X.509 certificate validation. If the request does not return a 200, continue
|
|
|
|
|
to step 4, otherwise:
|
|
|
|
|
|
|
|
|
|
XXX: should we follow redirects?
|
|
|
|
|
|
|
|
|
|
* Otherwise, if the port is present, then an IP address is discovered by
|
|
|
|
|
looking up an AAAA or A record for the hostname, and the specified port is
|
|
|
|
|
used.
|
|
|
|
|
The response must have a `Content-Type` of `application/json`, and must be
|
|
|
|
|
valid JSON which follows the structure documented below. Otherwise, the
|
|
|
|
|
request is aborted.
|
|
|
|
|
|
|
|
|
|
* If the hostname is not an IP literal and no port is given, the server is
|
|
|
|
|
discovered by first looking up a `_matrix._tcp` SRV record for the
|
|
|
|
|
hostname, which may give a hostname (to be looked up using AAAA or A queries)
|
|
|
|
|
and port. If the SRV record does not exist, then the server is discovered by
|
|
|
|
|
looking up an AAAA or A record on the hostname and taking the default
|
|
|
|
|
fallback port number of 8448.
|
|
|
|
|
If the response is valid, the `m.server` property is parsed as
|
|
|
|
|
`<delegated_server_name>[:<delegated_port>]`, and processed as follows:
|
|
|
|
|
|
|
|
|
|
Homeservers may use SRV records to load balance requests between multiple TLS
|
|
|
|
|
endpoints or to failover to another endpoint if an endpoint fails.
|
|
|
|
|
a. If `<delegated_server_name>` is an IP literal, then that IP address should
|
|
|
|
|
be used, together with `<delegated_port>`, or 8448 if no port is
|
|
|
|
|
given. The server should present a valid TLS certificate for
|
|
|
|
|
`<delegated_server_name>`.
|
|
|
|
|
|
|
|
|
|
The first two points remain unchanged: if the server name is an IP literal, or
|
|
|
|
|
contains a port, then requests will be made directly as before.
|
|
|
|
|
b. Otherwise, if the port is present, then an IP address is discovered by
|
|
|
|
|
looking up an AAAA or A record for `<delegated_server_name>`, and the
|
|
|
|
|
specified port is used. The server should present a valid TLS certificate
|
|
|
|
|
for `<delegated_server_name>`.
|
|
|
|
|
|
|
|
|
|
If the hostname is neither an IP literal, nor does it have an explicit port,
|
|
|
|
|
then the requesting server should continue to make an SRV lookup as before, and
|
|
|
|
|
use the result if one is found.
|
|
|
|
|
(In other words, the federation connection is made to
|
|
|
|
|
`https://<delegated_server_name>:<delegated_port>`).
|
|
|
|
|
|
|
|
|
|
If *no* SRV result is found, the requesting server should make a `GET` request
|
|
|
|
|
to `https://\<server_name>/.well-known/matrix/server`, with normal X.509
|
|
|
|
|
certificate validation. If the request fails in any way, then we fall back as
|
|
|
|
|
before to using using port 8448 on the hostname.
|
|
|
|
|
c. If the hostname is not an IP literal and no port is given, a second SRV
|
|
|
|
|
record is looked up; this time for `_matrix._tcp.<delegated_server_name>`,
|
|
|
|
|
which may give yet another hostname (to be looked up using A/AAAA queries)
|
|
|
|
|
and port. The server must present a TLS cert for the
|
|
|
|
|
`<delegated_server_name>` from the .well-known.
|
|
|
|
|
|
|
|
|
|
Rationale: Falling back to port 8448 (rather than aborting the request) is
|
|
|
|
|
necessary to maintain compatibility with existing deployments, which may not
|
|
|
|
|
present valid certificates on port 443, or may return 4xx or 5xx errors.
|
|
|
|
|
d. If no SRV record is found, the server is discovered by looking up an AAAA
|
|
|
|
|
or A record on `<delegated_server_name>`, and taking the default fallback
|
|
|
|
|
port number of 8448.
|
|
|
|
|
|
|
|
|
|
If the GET request succeeds, it should result in a JSON response, with contents
|
|
|
|
|
structured as shown:
|
|
|
|
|
(In other words, the federation connection is made to
|
|
|
|
|
`https://<delegated_server_name>:8448`).
|
|
|
|
|
|
|
|
|
|
### Structure of the `.well-known` response
|
|
|
|
|
|
|
|
|
|
The contents of the `.well-known` response should be structured as shown:
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"server": "<server>[:<port>]"
|
|
|
|
|
"m.server": "<server>[:<port>]"
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The `server` property should be a hostname or IP address, followed by an
|
|
|
|
|
The `m.server` property should be a hostname or IP address, followed by an
|
|
|
|
|
optional port.
|
|
|
|
|
|
|
|
|
|
If the response cannot be parsed as JSON, or lacks a valid `server` property,
|
|
|
|
|
the request is considered to have failed, and no fallback to port 8448 takes
|
|
|
|
|
place.
|
|
|
|
|
|
|
|
|
|
Otherwise, the requesting server performs an `AAAA/A` lookup on the hostname
|
|
|
|
|
(if necessary), and connects to the resultant address and the specifed
|
|
|
|
|
port. The port defaults to 8448, if unspecified.
|
|
|
|
|
|
|
|
|
|
(The formal grammar for the `server` property is identical to that of a [server
|
|
|
|
|
name](https://matrix.org/docs/spec/appendices.html#server-name).)
|
|
|
|
|
|
|
|
|
@ -92,18 +109,10 @@ sensible default: 24 hours is suggested.
|
|
|
|
|
Because there is no way to request a revalidation, it is also recommended that
|
|
|
|
|
requesting servers cap the expiry time. 48 hours is suggested.
|
|
|
|
|
|
|
|
|
|
Similarly, a failure to retrieve the `.well-known` file should be cached for
|
|
|
|
|
a reasonable period. 24 hours is suggested again.
|
|
|
|
|
|
|
|
|
|
### The future of SRV records
|
|
|
|
|
|
|
|
|
|
It's worth noting that this proposal is very clear in that we will maintain
|
|
|
|
|
support for SRV records for the immediate future; there are no current plans to
|
|
|
|
|
deprecate them.
|
|
|
|
|
|
|
|
|
|
However, clearly a `.well-known` file can provide much of the functionality of
|
|
|
|
|
an SRV record, and having to support both may be undesirable. Accordingly, we
|
|
|
|
|
may consider sunsetting SRV record support at some point in the future.
|
|
|
|
|
A failure to retrieve the `.well-known` file should also be cached, though care
|
|
|
|
|
must be taken that a single 500 error or connection failure should not break
|
|
|
|
|
federation for an extended period. A short cache time of about an hour might be
|
|
|
|
|
appropriate; alternatively, servers might use an exponential backoff.
|
|
|
|
|
|
|
|
|
|
### Outstanding questions
|
|
|
|
|
|
|
|
|
@ -127,7 +136,6 @@ as soon as possible, to maximise uptake in the ecosystem. It is likely that, as
|
|
|
|
|
we approach Matrix 1.0, there will be sufficient other new features (such as
|
|
|
|
|
new Room versions) that upgrading will be necessary anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Security considerations
|
|
|
|
|
|
|
|
|
|
The `.well-known` file potentially broadens the attack surface for an attacker
|
|
|
|
@ -138,6 +146,3 @@ wishing to intercept federation traffic to a particular server.
|
|
|
|
|
This proposal adds a new mechanism, alongside the existing `SRV` record lookup
|
|
|
|
|
for finding the server responsible for a particular matrix server_name, which
|
|
|
|
|
will allow greater flexibility in deploying homeservers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[^1] For example, Cloudflare automatically "flattens" SRV record responses.
|
|
|
|
|