From e628edfdc6e69302ef98930f78bd5fbe14ce834c Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Wed, 13 Jul 2016 15:17:11 +0100 Subject: [PATCH 1/6] Define MXID grammar Quick! Write down the decisions on the grammar before we get a chance to change our minds! Also some placeholder sections for other types of identifier. --- specification/intro.rst | 253 +++++++++++++++++++++++++++++++++++----- 1 file changed, 225 insertions(+), 28 deletions(-) diff --git a/specification/intro.rst b/specification/intro.rst index 650f1fc5..d6b62522 100644 --- a/specification/intro.rst +++ b/specification/intro.rst @@ -94,11 +94,8 @@ instant messages, VoIP call setups, or any other objects that need to be reliably and persistently pushed from A to B in an inter-operable and federated manner. -Overview --------- - Architecture -~~~~~~~~~~~~ +------------ Matrix defines APIs for synchronising extensible JSON objects known as "events" between compatible clients, servers and services. Clients are @@ -142,7 +139,7 @@ a long-lived GET request. | V | V +------------------+ +------------------+ | |---------( HTTPS )--------->| | - | homeserver | | homeserver | + | homeserver | | homeserver | | |<--------( HTTPS )----------| | +------------------+ Server-Server API +------------------+ History Synchronisation @@ -150,22 +147,19 @@ a long-lived GET request. Users -+++++ +~~~~~ Each client is associated with a user account, which is identified in Matrix -using a unique "User ID". This ID is namespaced to the homeserver which -allocated the account and has the form:: +using a unique identifier, or "MXID". This ID is namespaced to the homeserver +which allocated the account and has the form:: @localpart:domain -The ``localpart`` of a user ID may be a user name, or an opaque ID identifying -this user. The ``domain`` of a user ID is the domain of the homeserver. - -.. TODO-spec - - Need to specify precise grammar for Matrix IDs +See the `Identifier Grammar`_ section for full details of the structure of +an MXID. Events -++++++ +~~~~~~ All data exchanged over Matrix is expressed as an "event". Typically each client action (e.g. sending a message) correlates with exactly one event. Each event @@ -180,7 +174,7 @@ of a "Room". .. _package naming conventions: https://en.wikipedia.org/wiki/Java_package#Package_naming_conventions Event Graphs -++++++++++++ +~~~~~~~~~~~~ .. _sect:event-graph: @@ -204,7 +198,7 @@ of its parents. The root event should have a depth of 1. Thus if one event is before another, then it must have a strictly smaller depth. Room structure -++++++++++++++ +~~~~~~~~~~~~~~ A room is a conceptual place where users can send and receive events. Events are sent to a room, and all participants in that room with sufficient access will @@ -215,8 +209,12 @@ which have the form:: There is exactly one room ID for each room. Whilst the room ID does contain a domain, it is simply for globally namespacing room IDs. The room does NOT -reside on the domain specified. Room IDs are not meant to be human readable. -They are case-sensitive. The following conceptual diagram shows an +reside on the domain specified. + +See the `Identifier Grammar`_ section for full details of the structure of +a room ID. + +The following conceptual diagram shows an ``m.room.message`` event being sent to the room ``!qporfwt:matrix.org``:: { @alice:matrix.org } { @bob:domain.com } @@ -229,7 +227,7 @@ They are case-sensitive. The following conceptual diagram shows an | | V | +------------------+ +------------------+ - | homeserver | | homeserver | + | homeserver | | homeserver | | matrix.org | | domain.com | +------------------+ +------------------+ | ^ @@ -283,23 +281,21 @@ from the other servers participating in a room. Room Aliases -^^^^^^^^^^^^ +++++++++++++ Each room can also have multiple "Room Aliases", which look like:: #room_alias:domain -.. TODO - - Need to specify precise grammar for Room Aliases +See the `Identifier Grammar`_ section for full details of the structure of +a room alias. A room alias "points" to a room ID and is the human-readable label by which rooms are publicised and discovered. The room ID the alias is pointing to can be obtained by visiting the domain specified. Note that the mapping from a room alias to a room ID is not fixed, and may change over time to point to a different room ID. For this reason, Clients SHOULD resolve the room alias to a -room ID once and then use that ID on subsequent requests. Room aliases MUST NOT -exceed 255 bytes (including the domain). - +room ID once and then use that ID on subsequent requests. When resolving a room alias the server will also respond with a list of servers that are in the room that can be used to join via. @@ -319,7 +315,7 @@ that are in the room that can be used to join via. |________________________________| Identity -++++++++ +~~~~~~~~ Users in Matrix are identified via their matrix user ID (MXID). However, existing 3rd party ID namespaces can also be used in order to identify Matrix @@ -339,7 +335,7 @@ user IDs using 3PIDs. Profiles -++++++++ +~~~~~~~~ Users may publish arbitrary key/value data associated with their account - such as a human readable display name, a profile photo URL, contact information @@ -350,7 +346,7 @@ as a human readable display name, a profile photo URL, contact information names allowed to be? Private User Data -+++++++++++++++++ +~~~~~~~~~~~~~~~~~ Users may also store arbitrary private key/value data in their account - such as client preferences, or server configuration settings which lack any other @@ -361,6 +357,207 @@ dedicated API. The API is symmetrical to managing Profile data. private user data, but with different ACLs? +Identifier Grammar +------------------ + +Server Name +~~~~~~~~~~~ + +A homeserver is uniquely identified by its server name. This value is used in a +number of identifiers, as described below. + +The server name represents the address at which the homeserver in question can +be reached by other homeservers. The complete grammar is:: + + server_name = dns_name [ ":" port] + dns_name = host + port = *DIGIT + +where ``host`` is as defined by `RFC3986, section 3.2.2 +`_. + +.. NOTE:: + + The RFC3986 specification of a "host", allows IPv4 literals (``1.2.3.4``), and + IPv6 literals (``[1234:5678::abcd]``), as well as registered domain + names. Similarly, all of these formats are valid in Matrix server names and + identifiers. + + +Common Identifier Format +~~~~~~~~~~~~~~~~~~~~~~~~ + +The Matrix protocol uses a common format to assign unique identifiers to a +number of entities, including users, events and rooms. Each identifier takes +the form:: + + &localpart:domain + +where ``&`` represents a 'sigil' character; ``domain`` is the server name of +the homeserver which allocated the identifier, and ``localpart`` is an +identifier allocated by that homeserver. + +The sigil characters are as follows: + +* ``@``: User ID (MXID) +* ``!``: Room ID +* ``$``: Event ID +* ``#``: Room alias + +In some cases (such as Room IDs and Event IDs), the ``domain`` is present only +for namespacing, to avoid clashes of identifiers between different +homeservers. In other cases (User IDs and Room aliases), it defines the +authoritative homeserver for contacting the user or room in question. + +The precise grammar defining the allowable format of an identifier depends on +the type of identifier. + +User Identifiers +++++++++++++++++ + +Users within Matrix are uniquely identified by their MXID. The MXID is +namespaced to the homeserver which allocated the account and has the form:: + + @localpart:domain + +The ``localpart`` of an MXID is an opaque identifier for that user. It MUST NOT +be empty, and MUST contain only the characters ``a-z``, ``0-9``, ``.``, ``_``, +``=``, and ``-``. + +The ``domain`` of an MXID is the server name of the homeserver which allocated +the account. + +The length of an MXID, including the ``@`` sigil and the domain, MUST NOT +exceed 255 characters. + +The complete grammar for a legal MXID is:: + + mxid = "@" mxid_localpart ":" server_name + mxid_localpart = 1*mxid_char + mxid_char = DIGIT + / %x61-7A ; a-z + / "-" / "." / "=" / "_" + +.. admonition:: Rationale + + A number of factors were considered when defining the allowable characters + for an MXID. + + Firstly, we chose to exclude characters outside the basic US-ASCII character + set. MXIDs are primarily intended for use as an identifier at the protocol + level, and their use as a human-readable handle is of secondary + benefit. Furthermore, they are useful as a last-resort differentiator between + users with similar display names. Allowing the full unicode character set + would make very difficult for a human to distinguish two similar MXIDs. The + limited character set used has the advantage that even a user unfamiliar with + the Latin alphabet should be able to distinguish similar MXIDs manually, if + somewhat laboriously. + + We chose to disallow upper-case characters because we do not consider it + valid to have two MXIDs which differ only in case: indeed it should be + possible to reach ``@user:matrix.org`` as ``@USER:matrix.org``. However, + MXIDs are necessarily used in a number of situations which are inherently + case-sensitive (notably in the ``state_key`` of ``m.room.member`` + events). Forbidding upper-case characters (and requiring homeservers to + downcase usernames when creating MXIDs for new users) is a relatively simple + way to ensure that ``@USER:matrix.org`` cannot refer to a different user to + ``@user:matrix.org``. + + Finally, we decided to restrict the allowable punctuation to a very basic set + to ensure that the identifier can be used as-is in as wide a number of + situations as possible, without requiring escaping. For instance, allowing + "%" or "/" would make it harder to use an MXID in a URI. "*" is used as a + wildcard in some APIs (notably the filter API), so it also cannot be a legal + MXID character. + + The length restriction is derived from the limit on the length of the + ``sender`` key on events; since the MXID appears in every event sent by the + user, it is limited to ensure that the MXID does not dominate over the actual + content of the events. + +Historical MXIDs +<<<<<<<<<<<<<<<< + +Older versions of this specification were more tolerant of the characters +permitted in MXID localparts. There are currently active users whose MXIDs do +not conform to the permitted character set, and a number of rooms whose history +includes events with a ``sender`` which does not conform. In order to handle +these rooms successfully, clients and servers MUST accept MXIDs with localparts +from the expanded character set:: + + extended_mxid_char = %x21-7E + +Mapping from other character sets +<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< + +In certain circumstances it will be desirable to map from a wider character set +onto the limited character set allowed in an MXID localpart. Examples include a +homeserver creating an MXID for a new user based on their chosen login ID, or a +bridge mapping user ids from another protocol. + +Implmentations are free to do this mapping however they choose. Since the MXID +is opaque except to the implementation which created it, the only requirement +is that the implemention can perform the mapping consistently. However, we +suggest the following algorithm: + +1. Encode character strings as UTF-8. + +2. Convert the bytes ``A-Z`` to lower-case. + + * In the case where a bridge must be able to distinguish two different users + with ids which differ only by case, escape upper-case characters by + prefixing with ``_`` before downcasing. For example, ``A`` becomes + ``_a``. Escape a real ``_`` with a second ``_``. + +3. Encode any remaining bytes outside the allowed character set, as well + as ``=``, as their hexadecimal value, prefixed with ``=``. For + example, ``#`` becomes ``=23``; ``á`` becomes ``=c3=a1``. + +.. admonition:: Rationale + + The suggested mapping is an attempt to preserve human-readability of simple + ASCII identifiers (unlike, for example, base-32), whilst still allowing + representation of *any* character (unlike punycode, which provides no way to + encode ASCII punctuation). + + +Room IDs and Event IDs +++++++++++++++++++++++ + +A room has exactly one room ID. A room ID has the format:: + + !opaque_id:domain + +An event thas exactly one event ID. An event ID has the format:: + + $opaque_id:domain + +The ``domain`` of a room/event ID is the server name of the homeserver which created +the room/event. Note that the domain is used only for namespacing - there is no +implication that the room or event in question is still available at the +corresponding homeserver. + +Event IDs and Room IDs are case-sensitive. They are not mant to be human readable. + +.. TODO-spec + What is the grammar for the opaque part? https://matrix.org/jira/browse/SPEC-389 + +Room Aliases +++++++++++++ + +A room may have zero or more aliases. A room alias has the format:: + + #room_alias:domain + +The ``domain`` of a room alias is the server of the homeserver which created +the alias. Other servers may contact this homeserver to look up the alias. + +Room aliases MUST NOT exceed 255 bytes (including the ``#`` sigil and the domain). + +.. TODO-spec + - Need to specify precise grammar for Room Aliases. https://matrix.org/jira/browse/SPEC-391 + + License ------- From 001db4504688e94405bc87973da4396196db886e Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Wed, 13 Jul 2016 18:20:11 +0100 Subject: [PATCH 2/6] s/mxid/user id/, and other PR feedback. --- specification/intro.rst | 130 +++++++++++++++++++++------------------- 1 file changed, 67 insertions(+), 63 deletions(-) diff --git a/specification/intro.rst b/specification/intro.rst index d6b62522..ed046ace 100644 --- a/specification/intro.rst +++ b/specification/intro.rst @@ -150,13 +150,13 @@ Users ~~~~~ Each client is associated with a user account, which is identified in Matrix -using a unique identifier, or "MXID". This ID is namespaced to the homeserver +using a unique identifier, or "user ID". This ID is namespaced to the homeserver which allocated the account and has the form:: @localpart:domain See the `Identifier Grammar`_ section for full details of the structure of -an MXID. +user IDs. Events ~~~~~~ @@ -317,14 +317,14 @@ that are in the room that can be used to join via. Identity ~~~~~~~~ -Users in Matrix are identified via their matrix user ID (MXID). However, +Users in Matrix are identified via their Matrix user ID. However, existing 3rd party ID namespaces can also be used in order to identify Matrix users. A Matrix "Identity" describes both the user ID and any other existing IDs from third party namespaces *linked* to their account. Matrix users can *link* third-party IDs (3PIDs) such as email addresses, social network accounts and phone numbers to their user ID. Linking 3PIDs creates a mapping from a 3PID to a user ID. This mapping can then be used by Matrix -users in order to discover the MXIDs of their contacts. +users in order to discover the user IDs of their contacts. In order to ensure that the mapping from 3PID to user ID is genuine, a globally federated cluster of trusted "Identity Servers" (IS) are used to verify the 3PID and persist and replicate the mappings. @@ -376,12 +376,14 @@ be reached by other homeservers. The complete grammar is:: where ``host`` is as defined by `RFC3986, section 3.2.2 `_. -.. NOTE:: +Examples of valid server names are: - The RFC3986 specification of a "host", allows IPv4 literals (``1.2.3.4``), and - IPv6 literals (``[1234:5678::abcd]``), as well as registered domain - names. Similarly, all of these formats are valid in Matrix server names and - identifiers. +* ``matrix.org`` +* ``matrix.org:8888`` +* ``1.2.3.4`` (IPv4 literal) +* ``1.2.3.4:1234`` (IPv4 literal with explicit port) +* ``[1234:5678::abcd]`` (IPv6 literal) +* ``[1234:5678::abcd]:5678`` (IPv6 literal with explicit port) Common Identifier Format @@ -393,112 +395,110 @@ the form:: &localpart:domain -where ``&`` represents a 'sigil' character; ``domain`` is the server name of +where ``&`` represents a 'sigil' character; ``domain`` is the `server name`_ of the homeserver which allocated the identifier, and ``localpart`` is an identifier allocated by that homeserver. The sigil characters are as follows: -* ``@``: User ID (MXID) +* ``@``: User ID * ``!``: Room ID * ``$``: Event ID * ``#``: Room alias -In some cases (such as Room IDs and Event IDs), the ``domain`` is present only -for namespacing, to avoid clashes of identifiers between different -homeservers. In other cases (User IDs and Room aliases), it defines the -authoritative homeserver for contacting the user or room in question. - The precise grammar defining the allowable format of an identifier depends on the type of identifier. User Identifiers ++++++++++++++++ -Users within Matrix are uniquely identified by their MXID. The MXID is -namespaced to the homeserver which allocated the account and has the form:: +Users within Matrix are uniquely identified by their Matrix user ID. The user +ID is namespaced to the homeserver which allocated the account and has the +form:: @localpart:domain -The ``localpart`` of an MXID is an opaque identifier for that user. It MUST NOT -be empty, and MUST contain only the characters ``a-z``, ``0-9``, ``.``, ``_``, -``=``, and ``-``. +The ``localpart`` of a user ID is an opaque identifier for that user. It MUST +NOT be empty, and MUST contain only the characters ``a-z``, ``0-9``, ``.``, +``_``, ``=``, and ``-``. -The ``domain`` of an MXID is the server name of the homeserver which allocated -the account. +The ``domain`` of a user ID is the `server name`_ of the homeserver which +allocated the account. -The length of an MXID, including the ``@`` sigil and the domain, MUST NOT +The length of a user ID, including the ``@`` sigil and the domain, MUST NOT exceed 255 characters. -The complete grammar for a legal MXID is:: +The complete grammar for a legal user ID is:: - mxid = "@" mxid_localpart ":" server_name - mxid_localpart = 1*mxid_char - mxid_char = DIGIT - / %x61-7A ; a-z - / "-" / "." / "=" / "_" + user_id = "@" user_id_localpart ":" server_name + user_id_localpart = 1*user_id_char + user_id_char = DIGIT + / %x61-7A ; a-z + / "-" / "." / "=" / "_" .. admonition:: Rationale A number of factors were considered when defining the allowable characters - for an MXID. + for a user ID. Firstly, we chose to exclude characters outside the basic US-ASCII character - set. MXIDs are primarily intended for use as an identifier at the protocol + set. User IDs are primarily intended for use as an identifier at the protocol level, and their use as a human-readable handle is of secondary benefit. Furthermore, they are useful as a last-resort differentiator between users with similar display names. Allowing the full unicode character set - would make very difficult for a human to distinguish two similar MXIDs. The + would make very difficult for a human to distinguish two similar user IDs. The limited character set used has the advantage that even a user unfamiliar with - the Latin alphabet should be able to distinguish similar MXIDs manually, if + the Latin alphabet should be able to distinguish similar user IDs manually, if somewhat laboriously. We chose to disallow upper-case characters because we do not consider it - valid to have two MXIDs which differ only in case: indeed it should be + valid to have two user IDs which differ only in case: indeed it should be possible to reach ``@user:matrix.org`` as ``@USER:matrix.org``. However, - MXIDs are necessarily used in a number of situations which are inherently + user IDs are necessarily used in a number of situations which are inherently case-sensitive (notably in the ``state_key`` of ``m.room.member`` events). Forbidding upper-case characters (and requiring homeservers to - downcase usernames when creating MXIDs for new users) is a relatively simple + downcase usernames when creating user IDs for new users) is a relatively simple way to ensure that ``@USER:matrix.org`` cannot refer to a different user to ``@user:matrix.org``. Finally, we decided to restrict the allowable punctuation to a very basic set to ensure that the identifier can be used as-is in as wide a number of situations as possible, without requiring escaping. For instance, allowing - "%" or "/" would make it harder to use an MXID in a URI. "*" is used as a + "%" or "/" would make it harder to use a user ID in a URI. "*" is used as a wildcard in some APIs (notably the filter API), so it also cannot be a legal - MXID character. + user ID character. The length restriction is derived from the limit on the length of the - ``sender`` key on events; since the MXID appears in every event sent by the - user, it is limited to ensure that the MXID does not dominate over the actual + ``sender`` key on events; since the user ID appears in every event sent by the + user, it is limited to ensure that the user ID does not dominate over the actual content of the events. -Historical MXIDs -<<<<<<<<<<<<<<<< +Matrix user IDs are sometimes informally referred to as MXIDs. + +Historical User IDs +<<<<<<<<<<<<<<<<<<< Older versions of this specification were more tolerant of the characters -permitted in MXID localparts. There are currently active users whose MXIDs do -not conform to the permitted character set, and a number of rooms whose history -includes events with a ``sender`` which does not conform. In order to handle -these rooms successfully, clients and servers MUST accept MXIDs with localparts -from the expanded character set:: +permitted in user ID localparts. There are currently active users whose user +IDs do not conform to the permitted character set, and a number of rooms whose +history includes events with a ``sender`` which does not conform. In order to +handle these rooms successfully, clients and servers MUST accept user IDs with +localparts from the expanded character set:: - extended_mxid_char = %x21-7E + extended_user_id_char = %x21-7E Mapping from other character sets <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< In certain circumstances it will be desirable to map from a wider character set -onto the limited character set allowed in an MXID localpart. Examples include a -homeserver creating an MXID for a new user based on their chosen login ID, or a -bridge mapping user ids from another protocol. +onto the limited character set allowed in a user ID localpart. Examples include +a homeserver creating a user ID for a new user based on their chosen login ID, +or a bridge mapping user ids from another protocol. -Implmentations are free to do this mapping however they choose. Since the MXID -is opaque except to the implementation which created it, the only requirement -is that the implemention can perform the mapping consistently. However, we -suggest the following algorithm: +Implmentations are free to do this mapping however they choose. Since the user +ID is opaque except to the implementation which created it, the only +requirement is that the implemention can perform the mapping +consistently. However, we suggest the following algorithm: 1. Encode character strings as UTF-8. @@ -528,16 +528,18 @@ A room has exactly one room ID. A room ID has the format:: !opaque_id:domain -An event thas exactly one event ID. An event ID has the format:: +An event has exactly one event ID. An event ID has the format:: $opaque_id:domain -The ``domain`` of a room/event ID is the server name of the homeserver which created -the room/event. Note that the domain is used only for namespacing - there is no +The ``domain`` of a room/event ID is the `server name`_ of the homeserver which +created the room/event. The domain is used only for namespacing to avoid the +risk of clashes of identifiers between different homeservers. There is no implication that the room or event in question is still available at the corresponding homeserver. -Event IDs and Room IDs are case-sensitive. They are not mant to be human readable. +Event IDs and Room IDs are case-sensitive. They are not meant to be human +readable. .. TODO-spec What is the grammar for the opaque part? https://matrix.org/jira/browse/SPEC-389 @@ -549,10 +551,12 @@ A room may have zero or more aliases. A room alias has the format:: #room_alias:domain -The ``domain`` of a room alias is the server of the homeserver which created -the alias. Other servers may contact this homeserver to look up the alias. +The ``domain`` of a room alias is the `server name`_ of the homeserver which +created the alias. Other servers may contact this homeserver to look up the +alias. -Room aliases MUST NOT exceed 255 bytes (including the ``#`` sigil and the domain). +Room aliases MUST NOT exceed 255 bytes (including the ``#`` sigil and the +domain). .. TODO-spec - Need to specify precise grammar for Room Aliases. https://matrix.org/jira/browse/SPEC-391 From f942b6e5c162828a7c541c6c390ab6b31a8321cf Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Wed, 13 Jul 2016 18:27:40 +0100 Subject: [PATCH 3/6] remove some redundant words --- specification/intro.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specification/intro.rst b/specification/intro.rst index ed046ace..45916fd7 100644 --- a/specification/intro.rst +++ b/specification/intro.rst @@ -150,8 +150,8 @@ Users ~~~~~ Each client is associated with a user account, which is identified in Matrix -using a unique identifier, or "user ID". This ID is namespaced to the homeserver -which allocated the account and has the form:: +using a unique "user ID". This ID is namespaced to the homeserver which +allocated the account and has the form:: @localpart:domain From cdd19dca7ffeb2f1dad5b1e0d05c93125551173f Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Wed, 13 Jul 2016 18:32:29 +0100 Subject: [PATCH 4/6] fix typos --- specification/intro.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/specification/intro.rst b/specification/intro.rst index 45916fd7..702ee4df 100644 --- a/specification/intro.rst +++ b/specification/intro.rst @@ -495,7 +495,7 @@ onto the limited character set allowed in a user ID localpart. Examples include a homeserver creating a user ID for a new user based on their chosen login ID, or a bridge mapping user ids from another protocol. -Implmentations are free to do this mapping however they choose. Since the user +Implementations are free to do this mapping however they choose. Since the user ID is opaque except to the implementation which created it, the only requirement is that the implemention can perform the mapping consistently. However, we suggest the following algorithm: @@ -509,9 +509,9 @@ consistently. However, we suggest the following algorithm: prefixing with ``_`` before downcasing. For example, ``A`` becomes ``_a``. Escape a real ``_`` with a second ``_``. -3. Encode any remaining bytes outside the allowed character set, as well - as ``=``, as their hexadecimal value, prefixed with ``=``. For - example, ``#`` becomes ``=23``; ``á`` becomes ``=c3=a1``. +3. Encode any remaining bytes outside the allowed character set, as well as + ``=``, as their hexadecimal value, prefixed with ``=``. For example, ``#`` + becomes ``=23``; ``á`` becomes ``=c3=a1``. .. admonition:: Rationale From a2f1c6a7a6e0c1331afa8123832b8cce135097be Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Thu, 14 Jul 2016 14:37:42 +0100 Subject: [PATCH 5/6] Add a TODO about defining non-latin login creds --- specification/intro.rst | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/specification/intro.rst b/specification/intro.rst index 702ee4df..c6d5ec59 100644 --- a/specification/intro.rst +++ b/specification/intro.rst @@ -492,8 +492,13 @@ Mapping from other character sets In certain circumstances it will be desirable to map from a wider character set onto the limited character set allowed in a user ID localpart. Examples include -a homeserver creating a user ID for a new user based on their chosen login ID, -or a bridge mapping user ids from another protocol. +a homeserver creating a user ID for a new user based on the username passed to +``/register``, or a bridge mapping user ids from another protocol. + +.. TODO-spec + + We need to better define the mechanism by which homeservers can allow users + to have non-Latin usernames. Implementations are free to do this mapping however they choose. Since the user ID is opaque except to the implementation which created it, the only From 72449294bc107637f3a2a9bb39193b80074d76b1 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Thu, 14 Jul 2016 15:04:16 +0100 Subject: [PATCH 6/6] Moar TODO --- specification/intro.rst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/specification/intro.rst b/specification/intro.rst index c6d5ec59..959d5c0f 100644 --- a/specification/intro.rst +++ b/specification/intro.rst @@ -498,7 +498,11 @@ a homeserver creating a user ID for a new user based on the username passed to .. TODO-spec We need to better define the mechanism by which homeservers can allow users - to have non-Latin usernames. + to have non-Latin login credentials. The general idea is for clients to pass + the non-Latin in the ``username`` field to ``/register`` and ``/login``, and + the HS then maps it onto the MXID space when turning it into the + fully-qualified ``user_id`` which is returned to the client and used in + events. Implementations are free to do this mapping however they choose. Since the user ID is opaque except to the implementation which created it, the only