MSC3030: Jump to date API endpoint (#3030)
* Initial MSC draft for jump to date * Update with alternate /timestamp_to_event endpoint * Add origin_server_ts for quick remote to local comparison As discussed at https://github.com/matrix-org/synapse/pull/9445#discussion_r757098009 * Add origin_server_ts to client endpoint * Wrap lines * Use stable when discussing MSC and document unstable * Describe the direction parameter * Add server support detection * Fix typos * Explain what happens when an event can't be found Fix https://github.com/matrix-org/matrix-doc/pull/3030#discussion_r787002549 * Add context behind why we chose /timestamp_to_event vs alternatives Fix https://github.com/matrix-org/matrix-doc/pull/3030#discussion_r785425438 * Add comments about authentication and rate-limiting Fix https://github.com/matrix-org/matrix-doc/pull/3030#discussion_r786351083 * Return pagination token directly in future iteration See https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r787297190 * Abuse /timestamp_to_event to get create event As suggested by @turt2live, https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r846444317 * Unrenderable events As proposed by @turt2live, https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r846447351 * Add some complication thoughts around alternatives Context: https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r846449709 * Backfill event so we can get pagination token See https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r846578171 * Heuristic for which server to try first See https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r967574944 * Give a suggestion on where to backfill from See https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r967574843 * Add alternative suggestion from @alphapapa See https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r868478333 * Better wording and fix typo Co-authored-by: Travis Ralston <travisr@matrix.org> * No difference in homeservers See https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r992858188 * Fix typos Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> * Fix extra word typo * Summarizing discussion around why `dir` instead of closest See https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r849310414 * Adjust to just suggest the right way See https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r999099294 * Great simplification with the same meaning 🌟 Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> * Perfect is the enemy of good See https://github.com/matrix-org/matrix-spec-proposals/pull/3030#discussion_r1004651959 Co-authored-by: Travis Ralston <travisr@matrix.org> Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>travis/msc/audio-waveform
parent
a47591bb9c
commit
a96b752a4b
@ -0,0 +1,286 @@
|
||||
# MSC3030: Jump to date API endpoint
|
||||
|
||||
Add an API that makes it easy to find the closest messages for a given
|
||||
timestamp.
|
||||
|
||||
The goal of this change is to have clients be able to implement a jump to date
|
||||
feature in order to see messages back at a given point in time. Pick a date from
|
||||
a calender, heatmap, or paginate next/previous between days and view all of the
|
||||
messages that were sent on that date.
|
||||
|
||||
Alongside the [roadmap of feature parity with
|
||||
Gitter](https://github.com/vector-im/roadmap/issues/26), we're also interested
|
||||
in using this for a new better static Matrix archive. Our idea is to server-side
|
||||
render [Hydrogen](https://github.com/vector-im/hydrogen-web) and this new
|
||||
endpoint would allow us to jump back on the fly without having to paginate and
|
||||
keep track of everything in order to display the selected date.
|
||||
|
||||
Also useful for archiving and backup use cases. This new endpoint can be used to
|
||||
slice the messages by day and persist to file.
|
||||
|
||||
Related issue: [*URL for an arbitrary day of history and navigation for next and
|
||||
previous days*
|
||||
(vector-im/element-web#7677)](https://github.com/vector-im/element-web/issues/7677)
|
||||
|
||||
|
||||
## Problem
|
||||
|
||||
These types of use cases are not supported by the current Matrix API because it
|
||||
has no way to fetch or filter older messages besides a manual brute force
|
||||
pagination from the most recent event in the room. Paginating is time-consuming
|
||||
and expensive to process every event as you go (not practical for clients).
|
||||
Imagine wanting to get a message from 3 years ago 😫
|
||||
|
||||
|
||||
## Proposal
|
||||
|
||||
Add new client API endpoint `GET
|
||||
/_matrix/client/v1/rooms/{roomId}/timestamp_to_event?ts=<timestamp>&dir=[f|b]`
|
||||
which fetches the closest `event_id` to the given timestamp `ts` query parameter
|
||||
in the direction specified by the `dir` query parameter. The direction `dir`
|
||||
query parameter accepts `f` for forward-in-time from the timestamp and `b` for
|
||||
backward-in-time from the timestamp. This endpoint also returns
|
||||
`origin_server_ts` to make it easy to do a quick comparison to see if the
|
||||
`event_id` fetched is too far out of range to be useful for your use case.
|
||||
|
||||
When an event can't be found in the given direction, the endpoint throws a 404
|
||||
`"errcode":"M_NOT_FOUND",` (example error message `"error":"Unable to find event
|
||||
from 1672531200000 in direction f"`).
|
||||
|
||||
In order to solve the problem where a homeserver does not have all of the history in a
|
||||
room and no suitably close event, we also add a server API endpoint `GET
|
||||
/_matrix/federation/v1/timestamp_to_event/{roomId}?ts=<timestamp>?dir=[f|b]` which other
|
||||
homeservers can use to ask about their closest `event_id` to the timestamp. This
|
||||
endpoint also returns `origin_server_ts` to make it easy to do a quick comparison to see
|
||||
if the remote `event_id` fetched is closer than the local one. After the local
|
||||
homeserver receives a response from the federation endpoint, it probably should
|
||||
try to backfill this event via the federation `/event/<event_id>` endpoint so that it's
|
||||
available to query with `/context` from a client in order to get a pagination token.
|
||||
|
||||
The heuristics for deciding when to ask another homeserver for a closer event if
|
||||
your homeserver doesn't have something close, are left up to the homeserver
|
||||
implementation, although the heuristics will probably be based on whether the
|
||||
closest event is a forward/backward extremity indicating it's next to a gap of
|
||||
events which are potentially closer.
|
||||
|
||||
A good heuristic for which servers to try first is to sort by servers that have
|
||||
been in the room the longest because they're most likely to have anything we ask
|
||||
about.
|
||||
|
||||
These endpoints are authenticated and should be rate-limited like similar client
|
||||
and federation endpoints to prevent resource exhaustion abuse.
|
||||
|
||||
```
|
||||
GET /_matrix/client/v1/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
|
||||
{
|
||||
"event_id": ...
|
||||
"origin_server_ts": ...
|
||||
}
|
||||
```
|
||||
|
||||
Federation API endpoint:
|
||||
```
|
||||
GET /_matrix/federation/v1/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
|
||||
{
|
||||
"event_id": ...
|
||||
"origin_server_ts": ...
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
In order to paginate `/messages`, we need a pagination token which we can get
|
||||
using `GET /_matrix/client/r0/rooms/{roomId}/context/{eventId}?limit=0` for the
|
||||
`event_id` returned by `/timestamp_to_event`.
|
||||
|
||||
We can always iterate on `/timestamp_to_event` later and return a pagination
|
||||
token directly in another MSC ⏩
|
||||
|
||||
|
||||
## Potential issues
|
||||
|
||||
### Receiving a rogue random delayed event ID
|
||||
|
||||
Since `origin_server_ts` is not enforcably accurate, we can only hope that an event's
|
||||
`origin_server_ts` is relevant enough to its `prev_events` and descendants.
|
||||
|
||||
If you ask for "the message with `origin_server_ts` closest to Jan 1st 2018" you
|
||||
might actually get a rogue random delayed one that was backfilled from a
|
||||
federated server, but the human can figure that out by trying again with a
|
||||
slight variation on the date or something.
|
||||
|
||||
Since there isn't a good or fool-proof way to combat this, it's probably best to just go
|
||||
with `origin_server_ts` and not let perfect be the enemy of good.
|
||||
|
||||
|
||||
### Receiving an unrenderable event ID
|
||||
|
||||
Another issue is that clients could land on an event they can't/won't render,
|
||||
such as a reaction, then they'll be forced to desperately seek around the
|
||||
timeline until they find an event they can do something with.
|
||||
|
||||
Eg:
|
||||
- Client wants to jump to January 1st, 2022
|
||||
- Server says there's an event on January 2nd, 2022 that is close enough
|
||||
- Client finds out there's a ton of unrenderable events like memberships, poll responses, reactions, etc at that time
|
||||
- Client starts paginating forwards, finally finding an event on January 27th it can render
|
||||
- Client wasn't aware that the actual nearest neighbouring event was backwards on December 28th, 2021 because it didn't paginate in that direction
|
||||
- User is confused that they are a month past the target date when the message is *right there*.
|
||||
|
||||
Clients can be smarter here though. Clients can see when events were sent as
|
||||
they paginate and if they see they're going more than a couple days out, they
|
||||
can also try the other direction before going further and further away.
|
||||
|
||||
Clients can also just explain to the user what happened with a little toast: "We
|
||||
were unable to find an event to display on January 1st, 2022. The closest event
|
||||
after that date is on January 27th."
|
||||
|
||||
|
||||
### Abusing the `/timestamp_to_event` API to get the `m.room.create` event
|
||||
|
||||
Although it's possible to jump to the start of the room and get the first event in the
|
||||
room (`m.room.create`) with `/timestamp_to_event?dir=f&ts=0`, clients should still use
|
||||
`GET /_matrix/client/v3/rooms/{roomId}/state/m.room.create/` to get the room creation
|
||||
event.
|
||||
|
||||
In the future, with things like importing history via
|
||||
[MSC2716](https://github.com/matrix-org/matrix-spec-proposals/pull/2716), the first
|
||||
event you encounter with `/timestamp_to_event?dir=f&ts=0` could be an imported event before
|
||||
the room was created.
|
||||
|
||||
|
||||
## Alternatives
|
||||
|
||||
We chose the current `/timestamp_to_event` route because it sounded like the
|
||||
easist path forward to bring it to fruition and get some real-world experience.
|
||||
And was on our mind during the [initial discussion](https://docs.google.com/document/d/1KCEmpnGr4J-I8EeaVQ8QJZKBDu53ViI7V62y5BzfXr0/edit#bookmark=id.qu9k9wje9pxm) because there was some prior art with a [WIP
|
||||
implementation](https://github.com/matrix-org/synapse/pull/9445/commits/91b1b3606c9fb9eede0a6963bc42dfb70635449f)
|
||||
from @erikjohnston. The alternatives haven't been thrown out for a particular
|
||||
reason and we could still go down those routes depending on how people like the
|
||||
current design.
|
||||
|
||||
|
||||
### Paginate `/messages?around=<timestamp>` from timestamp
|
||||
|
||||
Add the `?around=<timestamp>` query parameter to the `GET
|
||||
/_matrix/client/r0/rooms/{roomId}/messages` endpoint. This will start the
|
||||
response at the message with `origin_server_ts` closest to the provided `around`
|
||||
timestamp. The direction is determined by the existing `?dir` query parameter.
|
||||
|
||||
Use topological ordering, just as Element would use if you follow a permalink.
|
||||
|
||||
This alternative could be confusing to the end-user around how this plays with
|
||||
the existing query parameters
|
||||
`/messages?from={paginationToken}&to={paginationToken}` which also determine
|
||||
what part of the timeline to query. Those parameters could be extended to accept
|
||||
timestamps in addition to pagination tokens but then could get confusing again
|
||||
when you start mixing timestamps and pagination tokens. The homeserver also has
|
||||
to disambiguate what a pagination token looks like vs a unix timestamp. Since
|
||||
pagination tokens don't follow a certain convention, some homeserver
|
||||
implementations may already be using arbitrary number tokens already which would
|
||||
be impossible to distinguish from a timestamp.
|
||||
|
||||
A related alternative is to use `/messages` with a `from_time`/`to_time` (or
|
||||
`from_ts`/`to_ts`) query parameters that only accept timestamps which solves the
|
||||
confusion and disambigution problem of trying to re-use the existing `from`/`to`
|
||||
query paramters. Re-using `/messages` would reduce the number of round-trips and
|
||||
potentially client-side implementations for the use case where you want to fetch
|
||||
a window of messages from a given time. But has the same round-trip problem if
|
||||
you want to use the returned `event_id` with `/context` or another endpoint
|
||||
instead.
|
||||
|
||||
|
||||
### Filter by date in `RoomEventFilter`
|
||||
|
||||
Extend `RoomEventFilter` to be able to specify a timestamp or a date range. The
|
||||
`RoomEventFilter` can be passed via the `?filter` query param on the `/messages`
|
||||
endpoint.
|
||||
|
||||
This suffers from the same confusion to the end-user of how it plays with how
|
||||
this plays with `/messages?from={paginationToken}&to={paginationToken}` which
|
||||
also determines what part of the timeline to query.
|
||||
|
||||
|
||||
### Return the closest event in any direction
|
||||
|
||||
We considered omitting the `dir` parameter (or allowing `dir=c`) to have the server
|
||||
return the closest event to the timestamp, regardless of direction. However, this seems
|
||||
to offer little benefit.
|
||||
|
||||
Firstly, for some usecases (such as archive viewing, where we want to show all the
|
||||
messages that happened on a particular day), an explicit direction is important, so this
|
||||
would have to be optional behaviour.
|
||||
|
||||
For a regular messaging client, "directionless" search also offers little benefit: it is
|
||||
easy for the client to repeat the request in the other direction if the returned event
|
||||
is "too far away", and in any case it needs to manage an iterative search to handle
|
||||
unrenderable events, as discussed above.
|
||||
|
||||
Implementing a directionless search on the server carries a performance overhead, since
|
||||
it must search both forwards and backwards on every request. In short, there is little
|
||||
reason to expect that a single `dir=c` request would be any more efficient than a pair of
|
||||
requests with `dir=b` and `dir=f`.
|
||||
|
||||
### New `destination_server_ts` field
|
||||
|
||||
Add a new field and index on messages called `destination_server_ts` which
|
||||
indicates when the message was received from federation. This gives a more
|
||||
"real" time for how someone would actually consume those messages.
|
||||
|
||||
The contract of the API is "show me messages my server received at time T"
|
||||
rather than the messy confusion of showing a delayed message which happened to
|
||||
originally be sent at time T.
|
||||
|
||||
We've decided against this approach because the backfill from federated servers
|
||||
could be horribly late.
|
||||
|
||||
---
|
||||
|
||||
Related issue around `/sync` vs `/messages`,
|
||||
https://github.com/matrix-org/synapse/issues/7164
|
||||
|
||||
> Sync returns things in the order they arrive at the server; backfill returns
|
||||
> them in the order determined by the event graph.
|
||||
>
|
||||
> *-- @richvdh, https://github.com/matrix-org/synapse/issues/7164#issuecomment-605877176*
|
||||
|
||||
> The general idea is that, if you're following a room in real-time (ie,
|
||||
> `/sync`), you probably want to see the messages as they arrive at your server,
|
||||
> rather than skipping any that arrived late; whereas if you're looking at a
|
||||
> historical section of timeline (ie, `/messages`), you want to see the best
|
||||
> representation of the state of the room as others were seeing it at the time.
|
||||
>
|
||||
> *-- @richvdh , https://github.com/matrix-org/synapse/issues/7164#issuecomment-605953296*
|
||||
|
||||
|
||||
## Security considerations
|
||||
|
||||
We're only going to expose messages according to the existing message history
|
||||
setting in the room (`m.room.history_visibility`). No extra data is exposed,
|
||||
just a new way to sort through it all.
|
||||
|
||||
|
||||
|
||||
## Unstable prefix
|
||||
|
||||
While this MSC is not considered stable, the endpoints are available at `/unstable/org.matrix.msc3030` instead of their `/v1` description from above.
|
||||
|
||||
```
|
||||
GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
|
||||
{
|
||||
"event_id": ...
|
||||
"origin_server_ts": ...
|
||||
}
|
||||
```
|
||||
|
||||
```
|
||||
GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
|
||||
{
|
||||
"event_id": ...
|
||||
"origin_server_ts": ...
|
||||
}
|
||||
```
|
||||
|
||||
Servers will indicate support for the new endpoint via a non-empty value for feature flag
|
||||
`org.matrix.msc3030` in `unstable_features` in the response to `GET
|
||||
/_matrix/client/versions`.
|
Loading…
Reference in New Issue