Merge pull request #1598 from matrix-org/rav/proposals/id_grammar
MSC 1597: Better spec for matrix identifierspull/977/head
commit
d9135ef902
@ -0,0 +1,273 @@
|
||||
# Grammars for identifiers in the Matrix protocol
|
||||
|
||||
## Background
|
||||
|
||||
Matrix uses client- or server-generated identifiers in a number of
|
||||
places. Historically the grammars for these have been underspecified, which
|
||||
leads to confusion about what is or is not a valid identifier with the
|
||||
possibility of incompatability between implementations.
|
||||
|
||||
This proposal presents tightly-specified grammars for a number of
|
||||
identifiers.
|
||||
|
||||
## Common Identifiers
|
||||
|
||||
[Spec](https://matrix.org/docs/spec/appendices.html#common-identifier-format)
|
||||
|
||||
Proposal:
|
||||
|
||||
> `localpart` may not include `:`. When parsing a Common Identifier, it should
|
||||
> be split at the *leftmost* `:`.
|
||||
|
||||
Rationale: server names may contain multiple `:`s (think IPv6 literals), so the
|
||||
first colon is the only sane place to split them. This is a Known Thing, but I
|
||||
don't think we spell it out anywhere in the spec.
|
||||
|
||||
## User IDs
|
||||
|
||||
User IDs are
|
||||
[well-specified](https://matrix.org/docs/spec/appendices.html#user-identifiers),
|
||||
however we should consider dropping `/` from the list of allowed characters,
|
||||
because HTTP proxies might rewrite
|
||||
`/_matrix/client/r0/profile/@foo%25bar:matrix.org/displayname` to
|
||||
`/_matrix/client/r0/profile/@foo/bar:matrix.org/displayname`, messing things
|
||||
up.
|
||||
|
||||
History: `/` was introduced with the intention of acting as a hierarchical
|
||||
namespacing character, particularly with consideration to the gitter protocol
|
||||
which uses it as a hierarchical separator. However, this was not as effective
|
||||
as hoped because `@foo/bar:example.com` looks like the ID is partitioned into
|
||||
`@foo` and `bar:example.com`.
|
||||
|
||||
Proposal:
|
||||
|
||||
> Remove `/` from the list of allowed characters in User IDs.
|
||||
|
||||
`/` will of course be maintained under the grammar of "historical user
|
||||
IDs". Sorting out that mess is a longer-term project.
|
||||
|
||||
## Room IDs and Event IDs
|
||||
|
||||
[Issue](https://github.com/matrix-org/matrix-doc/issues/667)
|
||||
[Spec](https://matrix.org/docs/spec/appendices.html#room-ids-and-event-ids)
|
||||
|
||||
These currently have similar formats, though it is likely that event ids will
|
||||
be replaced with something else due to
|
||||
[#1127](https://github.com/matrix-org/matrix-doc/issues/1127).
|
||||
|
||||
Currently they are both specified as ``?opaque_id:domain``, without clues as to
|
||||
what the opaque_id should be.
|
||||
|
||||
Synapse uses: `[A-Za-z]{18}`.
|
||||
[Dendrite](https://github.com/matrix-org/dendrite/blob/b71d922/src/github.com/matrix-org/dendrite/clientapi/routing/createroom.go#L125)
|
||||
uses (I think) `[A-Za-z0-9]{16}` via
|
||||
[json.go](https://github.com/matrix-org/util/blob/master/json.go#L185). However,
|
||||
some server implementations/forks are known to generate event IDs (and possibly
|
||||
room IDs) using a wide alphabet, which means that there exist rooms that
|
||||
include unusual event IDs.
|
||||
|
||||
Proposal:
|
||||
|
||||
> The opaque_id part must not be empty, and must consist entirely of the
|
||||
> characters `[0-9a-zA-Z.=_-]`.
|
||||
>
|
||||
> The total length (including sigil and domain) must not exceed 255 characters.
|
||||
>
|
||||
> This is only enforced for v2 rooms - servers and clients wishing to support
|
||||
> v1 rooms should be more tolerant.
|
||||
|
||||
|
||||
## Key IDs (for federation, e2e, and identity servers)
|
||||
|
||||
These are always of the form `<algorithm>:<tok>`.
|
||||
|
||||
Valid algorithms are defined at
|
||||
https://matrix.org/docs/spec/client_server/unstable.html#key-algorithms, though
|
||||
we should define the alphabet for future algorithms.
|
||||
|
||||
Proposal:
|
||||
|
||||
> Future algorithm identifiers will be assigned from the alphabet `[a-z0-9_.]`
|
||||
> and will be at most 31 characters in length.
|
||||
|
||||
For federation keys,
|
||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/config/key.py#L159)
|
||||
generates key ids as `ed25519:a_[A-Za-z]{4}`, though an HS admin can configure
|
||||
them manually to be anything without whitespace.
|
||||
|
||||
Key IDs end up in an Authorization header which looks like `X-Matrix
|
||||
origin=origin.example.com,key="keyId",sig="ABCDEF..."`. The Synapse
|
||||
implementation splits on `,` and `=` without regard to quoting so this
|
||||
currently precludes the use of `,` or `=` in a key ID.
|
||||
|
||||
For e2e, device keys have a `tok` corresponding to the device id, whilst
|
||||
one-time keys are generated by libolm, which uses a base64-encoded 32-bit int, ie
|
||||
`[A-Za-z0-9+/]{6}`.
|
||||
|
||||
A key ID needs to be unique over the lifetime of the server (for federation) or
|
||||
the device (for e2e). However, they are used fairly widely, so making them long
|
||||
is unattractive as they could significantly increase the amount of data being
|
||||
transmitted. Let's limit the 'tok' part of the key to 31 characters too.
|
||||
|
||||
Proposal:
|
||||
|
||||
> Key IDs use the following BNF grammar:
|
||||
>
|
||||
> ```
|
||||
> key_id = algorithm ":" tok
|
||||
>
|
||||
> algorithm = 1*31 alg_chars
|
||||
>
|
||||
> tok = 1*31 tok_chars
|
||||
>
|
||||
> alg_chars = %x61-7a / %30-39 / "_" / "."
|
||||
> ; a-z 0-9 _ .
|
||||
>
|
||||
> tok_chars = ALPHA / DIGIT / "." / "=" / "_" / "-"
|
||||
> ; A-Z a-z 0-9 . = _ -
|
||||
> ```
|
||||
>
|
||||
|
||||
Note that enforcing this grammar will mean:
|
||||
|
||||
* Making sure that synapse handles "=" characters in key IDs (easy).
|
||||
|
||||
* Making libolm not put + and / characters in key IDs (easy enough, but there
|
||||
will be a bunch of malformed unique keys out there in the wild. Possibly they
|
||||
would just get thrown away. Servers may need to continue to tolerate `+` and
|
||||
`/` in e2e keys for a while.)
|
||||
|
||||
## Opaque IDs
|
||||
|
||||
[Issue](https://github.com/matrix-org/matrix-doc/issues/666)
|
||||
|
||||
This is a class of identifier types where nobody is really meant to parse any
|
||||
part of the ID - they are just unique identifiers (with varying scopes of
|
||||
uniqueness). See below for discussion on what is currently in use.
|
||||
|
||||
I propose to specify the almost the same grammar for all of these, for
|
||||
simplicity and consistency.
|
||||
|
||||
Proposal:
|
||||
|
||||
> Opaque IDs must be strings consisting entirely of the characters
|
||||
> `[0-9a-zA-Z.=_-]`. Their length must not exceed 255 characters and they must
|
||||
> not be empty.
|
||||
|
||||
For almost all of the current implementations I have looked at (listed below),
|
||||
the grammar above is a superset of the generated identifiers, and a subset of
|
||||
the understood identifiers. There should therefore be no
|
||||
backwards-compatibility problems with its introduction.
|
||||
|
||||
The exception is transaction IDs generated by some clients. I think that we'll
|
||||
just have to fix those clients and accept that old versions may not work with
|
||||
future servers.
|
||||
|
||||
### Call IDs
|
||||
|
||||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#m-call-invite)
|
||||
|
||||
These are only used within the body of `m.call.*` events, as far as I am
|
||||
aware. They should be unique within the lifetime of a room. (Some
|
||||
implementations currently treat them as globally unique, but that is considered
|
||||
an implementation bug.)
|
||||
|
||||
[matrix-js-sdk](https://github.com/matrix-org/matrix-js-sdk/blob/4d310cd4618db4e98a8e6b5eb812480102ee4dee/src/webrtc/call.js#L72) uses `c[0-9.]{32}`.
|
||||
[matrix-android-sdk](https://github.com/matrix-org/matrix-android-sdk/blob/5c6f785e53632e7b6fb3f3859a90c3d85b040e7f/matrix-sdk/src/main/java/org/matrix/androidsdk/call/MXWebRtcCall.java#L221) uses `c[0-9]{13}`.
|
||||
|
||||
Additional proposal:
|
||||
|
||||
> Call IDs should be long enough to make clashes unlikely.
|
||||
|
||||
### Media IDs
|
||||
|
||||
[Spec](https://matrix.org/docs/spec/client_server/r0.3.0.html#id67)
|
||||
|
||||
These are generated by the server on upload, and then embedded in `mxc://` URIs
|
||||
and used in the C-S API and the S-S API.
|
||||
|
||||
They must be URI-safe to be sensibly embedded in `mxc://` URIs.
|
||||
|
||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/rest/media/v1/media_repository.py#L153)
|
||||
uses `[A-Za-z]{24}`, though it also uses `[0-9A-Za-z_-]{27}` for
|
||||
[URL
|
||||
previews](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/rest/media/v1/preview_url_resource.py#L285).
|
||||
|
||||
[matrix-media-repo](https://github.com/turt2live/matrix-media-repo/blob/539f25ee75ba6cdbb0410314b29978f4b8b1d7fe/src/github.com/turt2live/matrix-media-repo/controllers/upload_controller/upload_controller.go#L50)
|
||||
uses `[A-Za-z0-9]{32}`, via [random.go](https://github.com/turt2live/matrix-media-repo/blob/539f25ee75ba6cdbb0410314b29978f4b8b1d7fe/src/github.com/turt2live/matrix-media-repo/util/random.go#L18-L27).
|
||||
|
||||
### Filter IDs
|
||||
|
||||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#post-matrix-client-r0-user-userid-filter)
|
||||
|
||||
These are generated by the server and then used in the CS API. They are only
|
||||
required to be unique for a given user. `{` is already forbidden by the spec.
|
||||
|
||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/storage/filtering.py#L70-L73)
|
||||
uses a stringified int.
|
||||
|
||||
### Auth Session IDs
|
||||
|
||||
[Spec](https://matrix.org/docs/spec/client_server/r0.3.0.html#user-interactive-authentication-api)
|
||||
|
||||
These are generated by the server during auth, and then used in the CS
|
||||
API. However, they need to be unique for a given server.
|
||||
|
||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/auth.py#L494) uses `[A-Za-z]{24}`.
|
||||
|
||||
### Transaction IDs (for federation)
|
||||
|
||||
[Spec](https://matrix.org/docs/spec/server_server/unstable.html#put-matrix-federation-v1-send-txnid)
|
||||
|
||||
Generated by sending server. Needs to be unique for a given pair of servers.
|
||||
|
||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/federation/transaction_queue.py#L593) uses a stringified int and accepts pretty much anything.
|
||||
|
||||
### Transaction IDs (for C-S API)
|
||||
|
||||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#put-matrix-client-r0-rooms-roomid-send-eventtype-txnid)
|
||||
|
||||
These are generated by the client. They only need to be unique within the
|
||||
context of a single access_token/device.
|
||||
|
||||
Synapse doesn't appear to do any sanity-checking here currently.
|
||||
|
||||
[matrix-js-sdk](https://github.com/matrix-org/matrix-js-sdk/blob/c6b500bc09994ab5924ef8aab9bd10fc7ded5dae/src/base-apis.js#L123)
|
||||
uses `m[0-9]{13}.[0-9]{1,}`.
|
||||
[matrix-android-sdk](https://github.com/matrix-org/matrix-android-sdk/blob/088414fb187cae341690c3a01493b87d97f0169f/matrix-sdk/src/main/java/org/matrix/androidsdk/rest/model/Event.java#L503)
|
||||
uses a room ID plus a timestamp, hence kinda could be anything, but certainly
|
||||
will include a `!`.
|
||||
|
||||
### Device IDs
|
||||
|
||||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#relationship-between-access-tokens-and-devices)
|
||||
|
||||
These are normally generated by the server on login. It's possible for clients
|
||||
to present their own device_ids, but we're not aware of this feature being
|
||||
widely used.
|
||||
|
||||
They are used between users and across federation for E2E and to-device
|
||||
messages. They need to be unique for a particular user. They also appear in key
|
||||
IDs and must therefore be a subset of that grammar.
|
||||
|
||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/device.py#L89)
|
||||
generates device IDs with `[A-Z]{10}`. It appears to do little sanity-checking
|
||||
of client-generated device IDs currently.
|
||||
|
||||
Additional proposal:
|
||||
|
||||
> Device IDs must not exceed 31 characters in length.
|
||||
|
||||
### Message IDs
|
||||
|
||||
These are used in the server-server API for
|
||||
[Send-to-device messaging](https://matrix.org/docs/spec/server_server/unstable.html#send-to-device-messaging).
|
||||
|
||||
Synapse uses `[A-Za-z]{16}`, and accepts anything that fits in a postgres TEXT
|
||||
field. Ref: [devicemessage.py](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/devicemessage.py#L102).
|
||||
|
||||
|
||||
## Room Aliases
|
||||
|
||||
These are a complex topic and are discussed in [MSC
|
||||
1608](https://github.com/matrix-org/matrix-doc/issues/1608).
|
Loading…
Reference in New Issue