Update 4021-archive-controls.md

pull/4021/head
Jonah Aragon 1 year ago committed by GitHub
parent acffccf36c
commit 144fff4030
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -13,16 +13,22 @@ visibility and search engine indexing, for example.
## Proposal
Add an `m.room.archive_controls` state event where you can specify information about if and how you would like your
room to be crawled. The room directory must relay this information to clients.
room to be crawled. The [/publicRooms API](https://spec.matrix.org/v1.7/client-server-api/#get_matrixclientv3publicrooms)
must relay this information to clients.
| key | type | value | description | required
|--|--|--|--|--
| `archive` | boolean | | Whether the room should be included in room directory listings which are indended to be viewed by the public |
| `robots` | [string] | Valid [robots meta rules](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag#directives) | A list of rules which should be included in a `robots` meta tag and/or [HTTP header](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag#xrobotstag-implementation) by public-facing clients. e.g. `["noarchive"]` or `["noindex", "nofollow"]`.
| `via` | string | Hostname | A hostname which should be set as the canonical archive URL. e.g. `"archive.matrix.org"`.
Public-facing clients like [matrix-public-archive](https://github.com/matrix-org/matrix-public-archive) should validate
these rules before returning them in a response.
When `archive` is `false`, clients which display a room directory intended for public internet consumption (e.g.
matrix-public-archive or matrix-static) should exclude that room from being displayed. Clients which provide access
to native Matrix users (e.g. Element) should ignore this setting.
When `via` is specified, the client should return a [rel=canonical link element](https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls#rel-canonical-link-method)
and/or a [rel=canonical HTTP header](https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls#rel-canonical-header-method)
with the response pointing to the archive URL on the specified hostname. This prevents the Matrix.org public archive
@ -34,3 +40,13 @@ https://archive.matrix.org/r/main:example.net/date/2023/05/28 should return this
```
Link: <https://archive.example.net/r/main:example.net/date/2023/05/28>; rel="canonical"
```
## Alternatives
- [MSC2219](https://github.com/matrix-org/matrix-spec-proposals/pull/2291) could provide an alternative method of
specifying this information. However, this proposal includes the web archive metadata in the room directory API,
in order to access this information efficiently (this is a [requirement](https://github.com/matrix-org/matrix-public-archive/issues/47#issuecomment-1536938601)
for the matrix-public-archive project, for example). This proposal also allows rooms to opt-out of publicly accessible
room directories without clients like matrix-public-archive needing to join the room to read the state, and should
be interpreted by any client built for public web crawler access rather than [specific bots/clients](https://github.com/matrix-org/matrix-spec-proposals/pull/2291/files#diff-2b62d9e1c5ef21f7e10959da64da4000a69069b4dfb5d436db30d12c6bd23cb7R21-R23)

Loading…
Cancel
Save