From e2df3fd1fb126da3745e5ca89dddd44b6ef9e16f Mon Sep 17 00:00:00 2001 From: Neil Alexander Date: Tue, 4 Jan 2022 17:04:39 +0000 Subject: [PATCH 1/7] Add proposal to simplify federation `/send` response --- proposals/xxxx-simplify-federation-send.md | 26 ++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 proposals/xxxx-simplify-federation-send.md diff --git a/proposals/xxxx-simplify-federation-send.md b/proposals/xxxx-simplify-federation-send.md new file mode 100644 index 000000000..6a38c0f85 --- /dev/null +++ b/proposals/xxxx-simplify-federation-send.md @@ -0,0 +1,26 @@ +# MSCXXXX: Simplify federation `/send` response + +## Overview + +Currently we specify that the federation `/send` endpoint returns a body of `pdus: { string: PDU Processing Result}`. In theory a homeserver can return information here on an event-by-event basis as to whether there was a problem processing events in the transaction or not. + +However, this does not really make much difference in practice — soft-fails are silent and rejected events may be too – and server implementations do not "cherry-pick" which events in a transaction to retry later. Since the presence of a `txnId` in the request implies that we should consider a transaction to be idempotent for a given `txnId`, we should therefore either accept that the entire transaction was accepted successfully by the remote side or we retry the entire transaction later. + +The worst case is that the homeserver is not able to process the transaction at all for some reason, i.e. due to the database being down or similar, in which case the server really should just return a HTTP 500 status code and this signals to the sender to retry later. + +## Proposal + +This MSC proposes that we remove the `pdus` section from the response body, so that we return only one of two conditions: + +* A HTTP 200 with a `{}` body to signal that the transaction was accepted; +* A HTTP 500 to signal that there was a problem with the transaction and to retry sending later. + +## Benefits + +A significant benefit is that homeserver implementations no longer need to block the `/send` request in order to wait for the events to be processed for their error results. This can potentially remove head-of-line blocking from `/send` altogether if homeservers can durably queue the incoming events before returning. + +Another benefit is that homeservers no longer need to parse the response body at all and can instead just determine whether the transaction was accepted successfully by the HTTP status code. + +## Potential issues + +Synapse appears to use the `"pdus"` key for logging (see [here](https://github.com/matrix-org/synapse/blob/b38bdae3a2e5b7cfe862580368b996b8d7dfa50f/synapse/federation/sender/transaction_manager.py#L160)). Dendrite, however, ignores the response body altogether. \ No newline at end of file From 924bded6716333014902c4204ad77bf97e5cafb4 Mon Sep 17 00:00:00 2001 From: Neil Alexander Date: Tue, 4 Jan 2022 17:05:27 +0000 Subject: [PATCH 2/7] Use MSC number 3618 from PR --- ...lify-federation-send.md => 3618-simplify-federation-send.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename proposals/{xxxx-simplify-federation-send.md => 3618-simplify-federation-send.md} (97%) diff --git a/proposals/xxxx-simplify-federation-send.md b/proposals/3618-simplify-federation-send.md similarity index 97% rename from proposals/xxxx-simplify-federation-send.md rename to proposals/3618-simplify-federation-send.md index 6a38c0f85..e10075b12 100644 --- a/proposals/xxxx-simplify-federation-send.md +++ b/proposals/3618-simplify-federation-send.md @@ -1,4 +1,4 @@ -# MSCXXXX: Simplify federation `/send` response +# MSC3618: Simplify federation `/send` response ## Overview From 7f4b1cbc72ebbeabcb2f08c6032fa95105cba150 Mon Sep 17 00:00:00 2001 From: Neil Alexander Date: Tue, 4 Jan 2022 17:09:25 +0000 Subject: [PATCH 3/7] Wrapping --- proposals/3618-simplify-federation-send.md | 39 ++++++++++++++++------ 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/proposals/3618-simplify-federation-send.md b/proposals/3618-simplify-federation-send.md index e10075b12..3d0d782fd 100644 --- a/proposals/3618-simplify-federation-send.md +++ b/proposals/3618-simplify-federation-send.md @@ -2,25 +2,44 @@ ## Overview -Currently we specify that the federation `/send` endpoint returns a body of `pdus: { string: PDU Processing Result}`. In theory a homeserver can return information here on an event-by-event basis as to whether there was a problem processing events in the transaction or not. - -However, this does not really make much difference in practice — soft-fails are silent and rejected events may be too – and server implementations do not "cherry-pick" which events in a transaction to retry later. Since the presence of a `txnId` in the request implies that we should consider a transaction to be idempotent for a given `txnId`, we should therefore either accept that the entire transaction was accepted successfully by the remote side or we retry the entire transaction later. - -The worst case is that the homeserver is not able to process the transaction at all for some reason, i.e. due to the database being down or similar, in which case the server really should just return a HTTP 500 status code and this signals to the sender to retry later. +Currently we specify that the federation `/send` endpoint returns a body of +`pdus: { string: PDU Processing Result}`. In theory a homeserver can return +information here on an event-by-event basis as to whether there was a problem +processing events in the transaction or not. + +However, this does not really make much difference in practice — soft-fails +are silent and rejected events may be too – and server implementations do not +"cherry-pick" which events in a transaction to retry later. Since the presence +of a `txnId` in the request implies that we should consider a transaction to be +idempotent for a given `txnId`, we should therefore either accept that the +entire transaction was accepted successfully by the remote side or we should +retry the entire transaction. + +The worst case is that the homeserver is not able to process the transaction at +all for some reason, i.e. due to the database being down or similar, in which +case the server really should just return a HTTP 500 status code and this +signals to the sender to retry later. ## Proposal -This MSC proposes that we remove the `pdus` section from the response body, so that we return only one of two conditions: +This MSC proposes that we remove the `pdus` section from the response body, so +that we return only one of two conditions: * A HTTP 200 with a `{}` body to signal that the transaction was accepted; -* A HTTP 500 to signal that there was a problem with the transaction and to retry sending later. +* A HTTP 500 to signal that there was a problem with the transaction and to retry + sending later. ## Benefits -A significant benefit is that homeserver implementations no longer need to block the `/send` request in order to wait for the events to be processed for their error results. This can potentially remove head-of-line blocking from `/send` altogether if homeservers can durably queue the incoming events before returning. +A significant benefit is that homeserver implementations no longer need to block +the `/send` request in order to wait for the events to be processed for their error +results. This can potentially remove head-of-line blocking from `/send` altogether +if homeservers can durably queue the incoming events before returning. -Another benefit is that homeservers no longer need to parse the response body at all and can instead just determine whether the transaction was accepted successfully by the HTTP status code. +Another benefit is that homeservers no longer need to parse the response body at +all and can instead just determine whether the transaction was accepted successfully +by the HTTP status code. ## Potential issues -Synapse appears to use the `"pdus"` key for logging (see [here](https://github.com/matrix-org/synapse/blob/b38bdae3a2e5b7cfe862580368b996b8d7dfa50f/synapse/federation/sender/transaction_manager.py#L160)). Dendrite, however, ignores the response body altogether. \ No newline at end of file +Synapse appears to use the `"pdus"` key for logging (see [here](https://github.com/matrix-org/synapse/blob/b38bdae3a2e5b7cfe862580368b996b8d7dfa50f/synapse/federation/sender/transaction_manager.py#L160)). Dendrite, however, ignores the response body altogether. From e0999d6ec61cc83a4ba1eba4ba68c2cdb5e2c364 Mon Sep 17 00:00:00 2001 From: Neil Alexander Date: Wed, 5 Jan 2022 09:50:17 +0000 Subject: [PATCH 4/7] Clarifications on blocking and head-of-line issues --- proposals/3618-simplify-federation-send.md | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/proposals/3618-simplify-federation-send.md b/proposals/3618-simplify-federation-send.md index 3d0d782fd..cba69d08f 100644 --- a/proposals/3618-simplify-federation-send.md +++ b/proposals/3618-simplify-federation-send.md @@ -33,8 +33,25 @@ that we return only one of two conditions: A significant benefit is that homeserver implementations no longer need to block the `/send` request in order to wait for the events to be processed for their error -results. This can potentially remove head-of-line blocking from `/send` altogether -if homeservers can durably queue the incoming events before returning. +results. This can potentially allow homeserver implementations to remove head-of-line +blocking from `/send` by maintaining durable queues for incoming federation events and +processing them on a per-room basis. + +Given that it is possible for a transaction to contain events from multiple rooms, or +EDUs for unrelated purposes, it is bad that a single busy room can hold up incoming +transactions from a given server altogether. This means that new events for other +rooms may be held back unnecessarily by processing events for a single busy room, as +per the spec: + +> The sending server must wait and retry for a 200 OK response before sending a +> transaction with a different txnId to the receiving server. + +With this proposal, blocking becomes optional rather than required. Servers that do not +want to durably persist transactions before processing them can continue to perform all +work in-memory by continuing to block on `/send` as is done today. Additionally, a server +that is receiving too many transactions from a given homeserver may wish to block for +an arbitrary period of time for rate-limiting purposes, but this is not necessarily +required. Another benefit is that homeservers no longer need to parse the response body at all and can instead just determine whether the transaction was accepted successfully From d73e57058b88230f6fd0a8f6cd9f90ebf27f4c33 Mon Sep 17 00:00:00 2001 From: Neil Alexander Date: Wed, 5 Jan 2022 15:06:58 +0000 Subject: [PATCH 5/7] Update wording around "receiving homeservers" --- proposals/3618-simplify-federation-send.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/proposals/3618-simplify-federation-send.md b/proposals/3618-simplify-federation-send.md index cba69d08f..ab2641b61 100644 --- a/proposals/3618-simplify-federation-send.md +++ b/proposals/3618-simplify-federation-send.md @@ -46,16 +46,16 @@ per the spec: > The sending server must wait and retry for a 200 OK response before sending a > transaction with a different txnId to the receiving server. -With this proposal, blocking becomes optional rather than required. Servers that do not -want to durably persist transactions before processing them can continue to perform all -work in-memory by continuing to block on `/send` as is done today. Additionally, a server -that is receiving too many transactions from a given homeserver may wish to block for -an arbitrary period of time for rate-limiting purposes, but this is not necessarily -required. - -Another benefit is that homeservers no longer need to parse the response body at -all and can instead just determine whether the transaction was accepted successfully -by the HTTP status code. +With this proposal, blocking becomes optional rather than required. Receiving servers that +do not want to durably persist transactions before processing them can continue to perform +all work in-memory by continuing to block on `/send` as is done today. Additionally, a +receiving server that is receiving too many transactions from a remote homeserver may wish to +block for an arbitrary period of time for rate-limiting purposes, but this is an implementation +specific detail and not strictly required. + +Another benefit is that sending homeservers no longer need to parse the response body at +all and can instead just determine whether the transaction was accepted successfully by +observing the HTTP status code. ## Potential issues From 33bff5d0af9651aa2653c0045e5cfcc17c4e5c26 Mon Sep 17 00:00:00 2001 From: Neil Alexander Date: Mon, 10 Jan 2022 10:50:38 +0000 Subject: [PATCH 6/7] Address @richvdh review comment by removing HoL wording --- proposals/3618-simplify-federation-send.md | 23 +++++++++++----------- 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/proposals/3618-simplify-federation-send.md b/proposals/3618-simplify-federation-send.md index ab2641b61..a9dbaa260 100644 --- a/proposals/3618-simplify-federation-send.md +++ b/proposals/3618-simplify-federation-send.md @@ -31,27 +31,26 @@ that we return only one of two conditions: ## Benefits -A significant benefit is that homeserver implementations no longer need to block +A significant benefit is that the receiving homeserver no longer needs to block the `/send` request in order to wait for the events to be processed for their error -results. This can potentially allow homeserver implementations to remove head-of-line -blocking from `/send` by maintaining durable queues for incoming federation events and -processing them on a per-room basis. +results. Given that it is possible for a transaction to contain events from multiple rooms, or -EDUs for unrelated purposes, it is bad that a single busy room can hold up incoming -transactions from a given server altogether. This means that new events for other +EDUs for unrelated purposes, it is bad that a single busy room can lengthen the amount of +time to return the `/send` response to the caller. This means that new events for other rooms may be held back unnecessarily by processing events for a single busy room, as per the spec: > The sending server must wait and retry for a 200 OK response before sending a > transaction with a different txnId to the receiving server. -With this proposal, blocking becomes optional rather than required. Receiving servers that -do not want to durably persist transactions before processing them can continue to perform -all work in-memory by continuing to block on `/send` as is done today. Additionally, a -receiving server that is receiving too many transactions from a remote homeserver may wish to -block for an arbitrary period of time for rate-limiting purposes, but this is an implementation -specific detail and not strictly required. +With this proposal, the receiving server needing to block the `/send` response to wait for +`PDU Processing Result`s becomes optional rather than required. Receiving servers that do +not want to durably persist transactions before processing them can continue to perform all +work in-memory by continuing to block on `/send` as is done today. Additionally, a receiving +server that is receiving too many transactions from a remote homeserver may wish to block for +an arbitrary period of time for rate-limiting purposes, but this is an implementation specific +detail and not strictly required. Another benefit is that sending homeservers no longer need to parse the response body at all and can instead just determine whether the transaction was accepted successfully by From dad0c7dac286406b71c608658dd4a6c147bb5d08 Mon Sep 17 00:00:00 2001 From: Neil Alexander Date: Wed, 12 Jan 2022 17:29:55 +0000 Subject: [PATCH 7/7] Update 3618-simplify-federation-send.md --- proposals/3618-simplify-federation-send.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/proposals/3618-simplify-federation-send.md b/proposals/3618-simplify-federation-send.md index a9dbaa260..8bed8063f 100644 --- a/proposals/3618-simplify-federation-send.md +++ b/proposals/3618-simplify-federation-send.md @@ -31,9 +31,9 @@ that we return only one of two conditions: ## Benefits -A significant benefit is that the receiving homeserver no longer needs to block -the `/send` request in order to wait for the events to be processed for their error -results. +A significant benefit is that the receiving homeserver no longer needs to block the +the `/send` request in order to wait for the events to be processed for their `PDU Processing +Result`s. Given that it is possible for a transaction to contain events from multiple rooms, or EDUs for unrelated purposes, it is bad that a single busy room can lengthen the amount of @@ -44,10 +44,10 @@ per the spec: > The sending server must wait and retry for a 200 OK response before sending a > transaction with a different txnId to the receiving server. -With this proposal, the receiving server needing to block the `/send` response to wait for -`PDU Processing Result`s becomes optional rather than required. Receiving servers that do -not want to durably persist transactions before processing them can continue to perform all -work in-memory by continuing to block on `/send` as is done today. Additionally, a receiving +With this proposal, the receiving server no longer needs to wait for `PDU Processing Result`s +as this MSC does away with them. Receiving servers that do not want to durably persist transactions +before processing them can continue to perform all work in-memory by continuing to block the `/send` +request until all processing is completed, as may be done today. Additionally, a receiving server that is receiving too many transactions from a remote homeserver may wish to block for an arbitrary period of time for rate-limiting purposes, but this is an implementation specific detail and not strictly required. @@ -58,4 +58,6 @@ observing the HTTP status code. ## Potential issues -Synapse appears to use the `"pdus"` key for logging (see [here](https://github.com/matrix-org/synapse/blob/b38bdae3a2e5b7cfe862580368b996b8d7dfa50f/synapse/federation/sender/transaction_manager.py#L160)). Dendrite, however, ignores the response body altogether. +Synapse appears to use the `"pdus"` key for logging (see [here](https://github.com/matrix-org/synapse/blob/b38bdae3a2e5b7cfe862580368b996b8d7dfa50f/synapse/federation/sender/transaction_manager.py#L160)). +Conduit does the same and treats the response as an empty list if it is not present. Dendrite +ignores the response body altogether.