diff --git a/specification/intro.rst b/specification/intro.rst index 650f1fc5f..959d5c0ff 100644 --- a/specification/intro.rst +++ b/specification/intro.rst @@ -94,11 +94,8 @@ instant messages, VoIP call setups, or any other objects that need to be reliably and persistently pushed from A to B in an inter-operable and federated manner. -Overview --------- - Architecture -~~~~~~~~~~~~ +------------ Matrix defines APIs for synchronising extensible JSON objects known as "events" between compatible clients, servers and services. Clients are @@ -142,7 +139,7 @@ a long-lived GET request. | V | V +------------------+ +------------------+ | |---------( HTTPS )--------->| | - | homeserver | | homeserver | + | homeserver | | homeserver | | |<--------( HTTPS )----------| | +------------------+ Server-Server API +------------------+ History Synchronisation @@ -150,22 +147,19 @@ a long-lived GET request. Users -+++++ +~~~~~ Each client is associated with a user account, which is identified in Matrix -using a unique "User ID". This ID is namespaced to the homeserver which +using a unique "user ID". This ID is namespaced to the homeserver which allocated the account and has the form:: @localpart:domain -The ``localpart`` of a user ID may be a user name, or an opaque ID identifying -this user. The ``domain`` of a user ID is the domain of the homeserver. - -.. TODO-spec - - Need to specify precise grammar for Matrix IDs +See the `Identifier Grammar`_ section for full details of the structure of +user IDs. Events -++++++ +~~~~~~ All data exchanged over Matrix is expressed as an "event". Typically each client action (e.g. sending a message) correlates with exactly one event. Each event @@ -180,7 +174,7 @@ of a "Room". .. _package naming conventions: https://en.wikipedia.org/wiki/Java_package#Package_naming_conventions Event Graphs -++++++++++++ +~~~~~~~~~~~~ .. _sect:event-graph: @@ -204,7 +198,7 @@ of its parents. The root event should have a depth of 1. Thus if one event is before another, then it must have a strictly smaller depth. Room structure -++++++++++++++ +~~~~~~~~~~~~~~ A room is a conceptual place where users can send and receive events. Events are sent to a room, and all participants in that room with sufficient access will @@ -215,8 +209,12 @@ which have the form:: There is exactly one room ID for each room. Whilst the room ID does contain a domain, it is simply for globally namespacing room IDs. The room does NOT -reside on the domain specified. Room IDs are not meant to be human readable. -They are case-sensitive. The following conceptual diagram shows an +reside on the domain specified. + +See the `Identifier Grammar`_ section for full details of the structure of +a room ID. + +The following conceptual diagram shows an ``m.room.message`` event being sent to the room ``!qporfwt:matrix.org``:: { @alice:matrix.org } { @bob:domain.com } @@ -229,7 +227,7 @@ They are case-sensitive. The following conceptual diagram shows an | | V | +------------------+ +------------------+ - | homeserver | | homeserver | + | homeserver | | homeserver | | matrix.org | | domain.com | +------------------+ +------------------+ | ^ @@ -283,23 +281,21 @@ from the other servers participating in a room. Room Aliases -^^^^^^^^^^^^ +++++++++++++ Each room can also have multiple "Room Aliases", which look like:: #room_alias:domain -.. TODO - - Need to specify precise grammar for Room Aliases +See the `Identifier Grammar`_ section for full details of the structure of +a room alias. A room alias "points" to a room ID and is the human-readable label by which rooms are publicised and discovered. The room ID the alias is pointing to can be obtained by visiting the domain specified. Note that the mapping from a room alias to a room ID is not fixed, and may change over time to point to a different room ID. For this reason, Clients SHOULD resolve the room alias to a -room ID once and then use that ID on subsequent requests. Room aliases MUST NOT -exceed 255 bytes (including the domain). - +room ID once and then use that ID on subsequent requests. When resolving a room alias the server will also respond with a list of servers that are in the room that can be used to join via. @@ -319,16 +315,16 @@ that are in the room that can be used to join via. |________________________________| Identity -++++++++ +~~~~~~~~ -Users in Matrix are identified via their matrix user ID (MXID). However, +Users in Matrix are identified via their Matrix user ID. However, existing 3rd party ID namespaces can also be used in order to identify Matrix users. A Matrix "Identity" describes both the user ID and any other existing IDs from third party namespaces *linked* to their account. Matrix users can *link* third-party IDs (3PIDs) such as email addresses, social network accounts and phone numbers to their user ID. Linking 3PIDs creates a mapping from a 3PID to a user ID. This mapping can then be used by Matrix -users in order to discover the MXIDs of their contacts. +users in order to discover the user IDs of their contacts. In order to ensure that the mapping from 3PID to user ID is genuine, a globally federated cluster of trusted "Identity Servers" (IS) are used to verify the 3PID and persist and replicate the mappings. @@ -339,7 +335,7 @@ user IDs using 3PIDs. Profiles -++++++++ +~~~~~~~~ Users may publish arbitrary key/value data associated with their account - such as a human readable display name, a profile photo URL, contact information @@ -350,7 +346,7 @@ as a human readable display name, a profile photo URL, contact information names allowed to be? Private User Data -+++++++++++++++++ +~~~~~~~~~~~~~~~~~ Users may also store arbitrary private key/value data in their account - such as client preferences, or server configuration settings which lack any other @@ -361,6 +357,220 @@ dedicated API. The API is symmetrical to managing Profile data. private user data, but with different ACLs? +Identifier Grammar +------------------ + +Server Name +~~~~~~~~~~~ + +A homeserver is uniquely identified by its server name. This value is used in a +number of identifiers, as described below. + +The server name represents the address at which the homeserver in question can +be reached by other homeservers. The complete grammar is:: + + server_name = dns_name [ ":" port] + dns_name = host + port = *DIGIT + +where ``host`` is as defined by `RFC3986, section 3.2.2 +`_. + +Examples of valid server names are: + +* ``matrix.org`` +* ``matrix.org:8888`` +* ``1.2.3.4`` (IPv4 literal) +* ``1.2.3.4:1234`` (IPv4 literal with explicit port) +* ``[1234:5678::abcd]`` (IPv6 literal) +* ``[1234:5678::abcd]:5678`` (IPv6 literal with explicit port) + + +Common Identifier Format +~~~~~~~~~~~~~~~~~~~~~~~~ + +The Matrix protocol uses a common format to assign unique identifiers to a +number of entities, including users, events and rooms. Each identifier takes +the form:: + + &localpart:domain + +where ``&`` represents a 'sigil' character; ``domain`` is the `server name`_ of +the homeserver which allocated the identifier, and ``localpart`` is an +identifier allocated by that homeserver. + +The sigil characters are as follows: + +* ``@``: User ID +* ``!``: Room ID +* ``$``: Event ID +* ``#``: Room alias + +The precise grammar defining the allowable format of an identifier depends on +the type of identifier. + +User Identifiers +++++++++++++++++ + +Users within Matrix are uniquely identified by their Matrix user ID. The user +ID is namespaced to the homeserver which allocated the account and has the +form:: + + @localpart:domain + +The ``localpart`` of a user ID is an opaque identifier for that user. It MUST +NOT be empty, and MUST contain only the characters ``a-z``, ``0-9``, ``.``, +``_``, ``=``, and ``-``. + +The ``domain`` of a user ID is the `server name`_ of the homeserver which +allocated the account. + +The length of a user ID, including the ``@`` sigil and the domain, MUST NOT +exceed 255 characters. + +The complete grammar for a legal user ID is:: + + user_id = "@" user_id_localpart ":" server_name + user_id_localpart = 1*user_id_char + user_id_char = DIGIT + / %x61-7A ; a-z + / "-" / "." / "=" / "_" + +.. admonition:: Rationale + + A number of factors were considered when defining the allowable characters + for a user ID. + + Firstly, we chose to exclude characters outside the basic US-ASCII character + set. User IDs are primarily intended for use as an identifier at the protocol + level, and their use as a human-readable handle is of secondary + benefit. Furthermore, they are useful as a last-resort differentiator between + users with similar display names. Allowing the full unicode character set + would make very difficult for a human to distinguish two similar user IDs. The + limited character set used has the advantage that even a user unfamiliar with + the Latin alphabet should be able to distinguish similar user IDs manually, if + somewhat laboriously. + + We chose to disallow upper-case characters because we do not consider it + valid to have two user IDs which differ only in case: indeed it should be + possible to reach ``@user:matrix.org`` as ``@USER:matrix.org``. However, + user IDs are necessarily used in a number of situations which are inherently + case-sensitive (notably in the ``state_key`` of ``m.room.member`` + events). Forbidding upper-case characters (and requiring homeservers to + downcase usernames when creating user IDs for new users) is a relatively simple + way to ensure that ``@USER:matrix.org`` cannot refer to a different user to + ``@user:matrix.org``. + + Finally, we decided to restrict the allowable punctuation to a very basic set + to ensure that the identifier can be used as-is in as wide a number of + situations as possible, without requiring escaping. For instance, allowing + "%" or "/" would make it harder to use a user ID in a URI. "*" is used as a + wildcard in some APIs (notably the filter API), so it also cannot be a legal + user ID character. + + The length restriction is derived from the limit on the length of the + ``sender`` key on events; since the user ID appears in every event sent by the + user, it is limited to ensure that the user ID does not dominate over the actual + content of the events. + +Matrix user IDs are sometimes informally referred to as MXIDs. + +Historical User IDs +<<<<<<<<<<<<<<<<<<< + +Older versions of this specification were more tolerant of the characters +permitted in user ID localparts. There are currently active users whose user +IDs do not conform to the permitted character set, and a number of rooms whose +history includes events with a ``sender`` which does not conform. In order to +handle these rooms successfully, clients and servers MUST accept user IDs with +localparts from the expanded character set:: + + extended_user_id_char = %x21-7E + +Mapping from other character sets +<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< + +In certain circumstances it will be desirable to map from a wider character set +onto the limited character set allowed in a user ID localpart. Examples include +a homeserver creating a user ID for a new user based on the username passed to +``/register``, or a bridge mapping user ids from another protocol. + +.. TODO-spec + + We need to better define the mechanism by which homeservers can allow users + to have non-Latin login credentials. The general idea is for clients to pass + the non-Latin in the ``username`` field to ``/register`` and ``/login``, and + the HS then maps it onto the MXID space when turning it into the + fully-qualified ``user_id`` which is returned to the client and used in + events. + +Implementations are free to do this mapping however they choose. Since the user +ID is opaque except to the implementation which created it, the only +requirement is that the implemention can perform the mapping +consistently. However, we suggest the following algorithm: + +1. Encode character strings as UTF-8. + +2. Convert the bytes ``A-Z`` to lower-case. + + * In the case where a bridge must be able to distinguish two different users + with ids which differ only by case, escape upper-case characters by + prefixing with ``_`` before downcasing. For example, ``A`` becomes + ``_a``. Escape a real ``_`` with a second ``_``. + +3. Encode any remaining bytes outside the allowed character set, as well as + ``=``, as their hexadecimal value, prefixed with ``=``. For example, ``#`` + becomes ``=23``; ``รก`` becomes ``=c3=a1``. + +.. admonition:: Rationale + + The suggested mapping is an attempt to preserve human-readability of simple + ASCII identifiers (unlike, for example, base-32), whilst still allowing + representation of *any* character (unlike punycode, which provides no way to + encode ASCII punctuation). + + +Room IDs and Event IDs +++++++++++++++++++++++ + +A room has exactly one room ID. A room ID has the format:: + + !opaque_id:domain + +An event has exactly one event ID. An event ID has the format:: + + $opaque_id:domain + +The ``domain`` of a room/event ID is the `server name`_ of the homeserver which +created the room/event. The domain is used only for namespacing to avoid the +risk of clashes of identifiers between different homeservers. There is no +implication that the room or event in question is still available at the +corresponding homeserver. + +Event IDs and Room IDs are case-sensitive. They are not meant to be human +readable. + +.. TODO-spec + What is the grammar for the opaque part? https://matrix.org/jira/browse/SPEC-389 + +Room Aliases +++++++++++++ + +A room may have zero or more aliases. A room alias has the format:: + + #room_alias:domain + +The ``domain`` of a room alias is the `server name`_ of the homeserver which +created the alias. Other servers may contact this homeserver to look up the +alias. + +Room aliases MUST NOT exceed 255 bytes (including the ``#`` sigil and the +domain). + +.. TODO-spec + - Need to specify precise grammar for Room Aliases. https://matrix.org/jira/browse/SPEC-391 + + License -------