From 34897a5e6f17adab2643f9a67c815721f821e4dd Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Thu, 23 Dec 2021 16:39:49 -0600 Subject: [PATCH 01/10] Initial PDF markup proposal --- proposals/XXXX-pdf-markup.md | 110 +++++++++++++++++++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 proposals/XXXX-pdf-markup.md diff --git a/proposals/XXXX-pdf-markup.md b/proposals/XXXX-pdf-markup.md new file mode 100644 index 000000000..09d2c5722 --- /dev/null +++ b/proposals/XXXX-pdf-markup.md @@ -0,0 +1,110 @@ +# PDF annotation locations for markup + +[MSC3574](https://github.com/opentower/matrix-doc/blob/main/proposals/3574-resource-markup.md) +proposes a mechanism for marking up resources (webpages, documents, videos, and +other files) using Matrix. The proposed mechanism requires an +`m.markup.location` schema for representing the location of annotations within +different kinds of resources.MSC3574 punts on what standard location types +might be available, deferring that large family of questions to other MSCs. +This MSC aims to provide two basic location types for marking up PDFs. + +## Proposal + +Markup locations for PDFs should approximately follow the format of embedded +annotations provided in the PDF standard, for more straightforward integration +with PDF rendering and editing libraries that clients may wish to make use of. + +The PDF 1.4 standard includes 19 different kinds of annotations (see [p499 +here](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf)). +This proposal provides events for two of these: *Text Annotations*, which +represent "sticky notes" at a certain point in the text, and *Highlights*, +which represent a certain range of text that should be highlighted. + +PDF 1.4 annotations all accept a very large set of different attributes. Of +these, only two are mandatory: `Subtype` and `Rect`, where `Subtype` gives the +annotation type, and `Rect` gives the position of the annotation on the PDF +page as a rectangle represented by an array of the form + + [lower-left-x, lower-left-y, upper-right-x, upper-right-y] + +where each item is a number of "user space units" (72ths of an inch) from the +bottom left corner of the page, sometimes called *points*. + +This MSC does not propose to include any of the optional attributes. The +`Subtype` attribute will be indicated by a key of the `m.markup.location` +object. So only `Rect`, and the attributes specific to each annotation type, +need to be provided for. + +### Text Annotations + +Text annotations will be represented within an `m.markup.location` as follows: + +``` +m.markup.location: { + m.markup.pdf.text: { + rect: {left: ..., right: ..., top: ..., bottom: ...} + contents: ... + } + .. +} +``` + +The `contents` is a string indicating text for the text annotation. Precisely +how to set it will be left as an implementation detail for clients. + +Optionally, `m.markup.pdf.text` may also contain a `name` value, which should +be a string that names an icon to be used in displaying the annotation. +Standardly recognized values are: "Comment", "Key", "Note", "Help", +"NewParagraph", "Paragraph" and "Insert". + +### Highlight Annotations + +Highlight Annotations will be represented within an `m.markup.location` as +follows: + +``` +m.markup.location: { + m.markup.pdf.text: { + rect: {left: ..., right: ..., top: ..., bottom: ...} + contents: ... + quadPoints: [...] + } + .. +} +``` + +The `contents` are as above. `quadPoints` is an array of arrays of the form: + + [x_1,y_1,x_2,y_2,x_3,y_3,x_4,y_4] + +each of which represents the vertices (in counterclockwise order) of an +oriented quadrilateral region of the PDF page. Each quadrilateral is meant to +encompass a word or group of contiguous words in the highlighted text. + +Optionally, the `m.markup.location` may also include a `textContent` value, +which should be a string containing the highlighted text. the `textContent` +value is not part of the PDF standard, but is included as a convenience for +clients. + +## Alternatives + +Rather than accepting this MSC, we could wait for a more comprehensive MSC that +tries to comprehensively specify a complete set of location types on PDFs. +However, it seems best to work iteratively, and start with the pdf location +types that can most easily be implemented, rather than waiting until something +truly comprehensive can be implemented. + +Rather than using userspace units, we could use some more fine-grained +coordinate system, for example milli-units. The PDF 1.4 standard lets units +take on "real number values" so precision greater than one unit is possible. +But since we can't have float values in matrix events, we can't capture this +greater precision on the present proposal. However, this would probably create +confusion, and precision greater than 1/72th of an inch is probably excessive. + +## Security considerations + +None. + +## Unstable prefix + +TBD From a6abbeba1e66e98b284d12611590f9cf3a2a562d Mon Sep 17 00:00:00 2001 From: gleachkr Date: Thu, 23 Dec 2021 16:56:56 -0600 Subject: [PATCH 02/10] Update with MSC number --- proposals/{XXXX-pdf-markup.md => 3592-pdf-markup.md} | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) rename proposals/{XXXX-pdf-markup.md => 3592-pdf-markup.md} (88%) diff --git a/proposals/XXXX-pdf-markup.md b/proposals/3592-pdf-markup.md similarity index 88% rename from proposals/XXXX-pdf-markup.md rename to proposals/3592-pdf-markup.md index 09d2c5722..f021e420d 100644 --- a/proposals/XXXX-pdf-markup.md +++ b/proposals/3592-pdf-markup.md @@ -64,7 +64,7 @@ follows: ``` m.markup.location: { - m.markup.pdf.text: { + m.markup.pdf.highlight: { rect: {left: ..., right: ..., top: ..., bottom: ...} contents: ... quadPoints: [...] @@ -107,4 +107,7 @@ None. ## Unstable prefix -TBD +| Proposed Final Identifier | Purpose | Development Identifier | +| ------------------------- | ---------------------------------------------------------- | ----------------------------------------- | +| `m.markup.pdf.text` | key in `m.markup.location` | `com.open-tower.msc3592.markup.text` | +| `m.markup.pdf.highlight` | key in `m.markup.location` | `com.open-tower.msc3592.markup.highlight` | From 34a1ad0888df7a9ef82b9a7c6643951adc28bfcd Mon Sep 17 00:00:00 2001 From: gleachkr Date: Thu, 23 Dec 2021 17:01:04 -0600 Subject: [PATCH 03/10] Fix title --- proposals/3592-pdf-markup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3592-pdf-markup.md b/proposals/3592-pdf-markup.md index f021e420d..612733013 100644 --- a/proposals/3592-pdf-markup.md +++ b/proposals/3592-pdf-markup.md @@ -1,4 +1,4 @@ -# PDF annotation locations for markup +# Markup locations for PDF documents [MSC3574](https://github.com/opentower/matrix-doc/blob/main/proposals/3574-resource-markup.md) proposes a mechanism for marking up resources (webpages, documents, videos, and From 9459c6e4d2a6435f2441870f0494bc40340234de Mon Sep 17 00:00:00 2001 From: gleachkr Date: Thu, 23 Dec 2021 21:24:15 -0600 Subject: [PATCH 04/10] Fix typo --- proposals/3592-pdf-markup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3592-pdf-markup.md b/proposals/3592-pdf-markup.md index 612733013..960105898 100644 --- a/proposals/3592-pdf-markup.md +++ b/proposals/3592-pdf-markup.md @@ -81,7 +81,7 @@ each of which represents the vertices (in counterclockwise order) of an oriented quadrilateral region of the PDF page. Each quadrilateral is meant to encompass a word or group of contiguous words in the highlighted text. -Optionally, the `m.markup.location` may also include a `textContent` value, +Optionally, the `m.markup.pdf.highlight` may also include a `textContent` value, which should be a string containing the highlighted text. the `textContent` value is not part of the PDF standard, but is included as a convenience for clients. From 8c344654e7c261acb07064f111d6cd5900c15dc4 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Mon, 27 Dec 2021 11:07:22 -0600 Subject: [PATCH 05/10] Snake case keys, add page index. --- proposals/3592-pdf-markup.md | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/proposals/3592-pdf-markup.md b/proposals/3592-pdf-markup.md index 960105898..e2c8291c4 100644 --- a/proposals/3592-pdf-markup.md +++ b/proposals/3592-pdf-markup.md @@ -4,7 +4,7 @@ proposes a mechanism for marking up resources (webpages, documents, videos, and other files) using Matrix. The proposed mechanism requires an `m.markup.location` schema for representing the location of annotations within -different kinds of resources.MSC3574 punts on what standard location types +different kinds of resources. MSC3574 punts on what standard location types might be available, deferring that large family of questions to other MSCs. This MSC aims to provide two basic location types for marking up PDFs. @@ -14,13 +14,13 @@ Markup locations for PDFs should approximately follow the format of embedded annotations provided in the PDF standard, for more straightforward integration with PDF rendering and editing libraries that clients may wish to make use of. -The PDF 1.4 standard includes 19 different kinds of annotations (see [p499 -here](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf)). +The PDF standard includes 19 different kinds of annotations (see [p499 here] +(https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf)). This proposal provides events for two of these: *Text Annotations*, which represent "sticky notes" at a certain point in the text, and *Highlights*, which represent a certain range of text that should be highlighted. -PDF 1.4 annotations all accept a very large set of different attributes. Of +PDF annotations all accept a very large set of different attributes. Of these, only two are mandatory: `Subtype` and `Rect`, where `Subtype` gives the annotation type, and `Rect` gives the position of the annotation on the PDF page as a rectangle represented by an array of the form @@ -35,6 +35,13 @@ This MSC does not propose to include any of the optional attributes. The object. So only `Rect`, and the attributes specific to each annotation type, need to be provided for. +Within a PDF, an annotation occurs as part of the content stream associated +with a particular page, so the page doesn't need to be indicated within the +annotation. Since this information is not automatically available Markup +locations will also require a *page index* field. The page index is a +non-negative integer, and is distinct from a *page label*, which is a string +(for example "iv" within the front-matter of a book). + ### Text Annotations Text annotations will be represented within an `m.markup.location` as follows: @@ -44,6 +51,7 @@ m.markup.location: { m.markup.pdf.text: { rect: {left: ..., right: ..., top: ..., bottom: ...} contents: ... + page_index: ... } .. } @@ -66,14 +74,15 @@ follows: m.markup.location: { m.markup.pdf.highlight: { rect: {left: ..., right: ..., top: ..., bottom: ...} - contents: ... - quadPoints: [...] + contents: ..., + quad_points: [...], + page_index: ... } .. } ``` -The `contents` are as above. `quadPoints` is an array of arrays of the form: +The `contents` are as above. `quad_points` is an array of arrays of the form: [x_1,y_1,x_2,y_2,x_3,y_3,x_4,y_4] @@ -81,8 +90,8 @@ each of which represents the vertices (in counterclockwise order) of an oriented quadrilateral region of the PDF page. Each quadrilateral is meant to encompass a word or group of contiguous words in the highlighted text. -Optionally, the `m.markup.pdf.highlight` may also include a `textContent` value, -which should be a string containing the highlighted text. the `textContent` +Optionally, the `m.markup.pdf.highlight` may also include a `text_content` value, +which should be a string containing the highlighted text. the `text_content` value is not part of the PDF standard, but is included as a convenience for clients. @@ -95,7 +104,7 @@ types that can most easily be implemented, rather than waiting until something truly comprehensive can be implemented. Rather than using userspace units, we could use some more fine-grained -coordinate system, for example milli-units. The PDF 1.4 standard lets units +coordinate system, for example milli-units. The PDF standard lets units take on "real number values" so precision greater than one unit is possible. But since we can't have float values in matrix events, we can't capture this greater precision on the present proposal. However, this would probably create From 4b80b8dd074d6c8ece37c0f8f82b84be95884da6 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Mon, 27 Dec 2021 11:10:58 -0600 Subject: [PATCH 06/10] Update 3592-pdf-markup.md --- proposals/3592-pdf-markup.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/proposals/3592-pdf-markup.md b/proposals/3592-pdf-markup.md index e2c8291c4..20239a79f 100644 --- a/proposals/3592-pdf-markup.md +++ b/proposals/3592-pdf-markup.md @@ -36,11 +36,12 @@ object. So only `Rect`, and the attributes specific to each annotation type, need to be provided for. Within a PDF, an annotation occurs as part of the content stream associated -with a particular page, so the page doesn't need to be indicated within the -annotation. Since this information is not automatically available Markup -locations will also require a *page index* field. The page index is a -non-negative integer, and is distinct from a *page label*, which is a string -(for example "iv" within the front-matter of a book). +with a particular page, so the page number doesn't need to be represented as +an attribute of the annotation. Since this information is not automatically +available in the Matrix context, `m.markup` locations for PDFs will also +require a *page index* field. The page index is a non-negative integer, and +is distinct from a *page label*, which is a string (for example "iv" within +the front matter of a book). ### Text Annotations From 8872f1618fe0354b888158a71dd2ca56ffcc5d0d Mon Sep 17 00:00:00 2001 From: gleachkr Date: Mon, 27 Dec 2021 11:31:48 -0600 Subject: [PATCH 07/10] Tweak wording somewhat --- proposals/3592-pdf-markup.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/proposals/3592-pdf-markup.md b/proposals/3592-pdf-markup.md index 20239a79f..fa78f31b5 100644 --- a/proposals/3592-pdf-markup.md +++ b/proposals/3592-pdf-markup.md @@ -14,11 +14,12 @@ Markup locations for PDFs should approximately follow the format of embedded annotations provided in the PDF standard, for more straightforward integration with PDF rendering and editing libraries that clients may wish to make use of. -The PDF standard includes 19 different kinds of annotations (see [p499 here] -(https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf)). -This proposal provides events for two of these: *Text Annotations*, which -represent "sticky notes" at a certain point in the text, and *Highlights*, -which represent a certain range of text that should be highlighted. +The PDF standard includes many different kinds of annotations: 19 in PDF 1.4 +(see [p499 here](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf)) +and 26 in PDF 1.7, (see [p390 here](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf)). +This proposal introduces events for two of these kinds of annotations: *Text +Annotations*, which represent "sticky notes" at a certain point in the text, +and *Highlights*,which represent a certain range of text that should be highlighted. PDF annotations all accept a very large set of different attributes. Of these, only two are mandatory: `Subtype` and `Rect`, where `Subtype` gives the From 9cdbf3e401123b3069cbce3a174d0b227c62957a Mon Sep 17 00:00:00 2001 From: gleachkr Date: Fri, 7 Jan 2022 12:23:11 -0600 Subject: [PATCH 08/10] Fix development identifier typo --- proposals/3592-pdf-markup.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/proposals/3592-pdf-markup.md b/proposals/3592-pdf-markup.md index fa78f31b5..06ff4f49d 100644 --- a/proposals/3592-pdf-markup.md +++ b/proposals/3592-pdf-markup.md @@ -118,7 +118,7 @@ None. ## Unstable prefix -| Proposed Final Identifier | Purpose | Development Identifier | -| ------------------------- | ---------------------------------------------------------- | ----------------------------------------- | -| `m.markup.pdf.text` | key in `m.markup.location` | `com.open-tower.msc3592.markup.text` | -| `m.markup.pdf.highlight` | key in `m.markup.location` | `com.open-tower.msc3592.markup.highlight` | +| Proposed Final Identifier | Purpose | Development Identifier | +| ------------------------- | ---------------------------------------------------------- | --------------------------------------------- | +| `m.markup.pdf.text` | key in `m.markup.location` | `com.open-tower.msc3592.markup.pdf.text` | +| `m.markup.pdf.highlight` | key in `m.markup.location` | `com.open-tower.msc3592.markup.pdf.highlight` | From 1ccf88656bf2204f230e84bb8f85cfc375c7bc3a Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Mon, 14 Mar 2022 13:14:49 -0500 Subject: [PATCH 09/10] Use PR link, add security concern --- proposals/3592-pdf-markup.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/proposals/3592-pdf-markup.md b/proposals/3592-pdf-markup.md index 06ff4f49d..29470f7af 100644 --- a/proposals/3592-pdf-markup.md +++ b/proposals/3592-pdf-markup.md @@ -1,6 +1,6 @@ # Markup locations for PDF documents -[MSC3574](https://github.com/opentower/matrix-doc/blob/main/proposals/3574-resource-markup.md) +[MSC3574](https://github.com/matrix-org/matrix-spec-proposals/pull/3574) proposes a mechanism for marking up resources (webpages, documents, videos, and other files) using Matrix. The proposed mechanism requires an `m.markup.location` schema for representing the location of annotations within @@ -93,7 +93,7 @@ oriented quadrilateral region of the PDF page. Each quadrilateral is meant to encompass a word or group of contiguous words in the highlighted text. Optionally, the `m.markup.pdf.highlight` may also include a `text_content` value, -which should be a string containing the highlighted text. the `text_content` +which should be a string containing the highlighted text. The `text_content` value is not part of the PDF standard, but is included as a convenience for clients. @@ -114,7 +114,12 @@ confusion, and precision greater than 1/72th of an inch is probably excessive. ## Security considerations -None. +Because room state is unencrypted, `m.space.child` events conveying locations +via `m.markup.location.highlight` could leak information about an encrypted +resource text through the `text_contents` field, or about the annotation itself +through the `contents` field. This is part of a more general problem with state +events potentially leaking information, and deserves a general resolution, a la +[MSC3414](https://github.com/matrix-org/matrix-spec-proposals/pull/3414) ## Unstable prefix From 153a665a349e3652e1b8d303f9080d2210eded64 Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Wed, 18 May 2022 15:17:23 -0500 Subject: [PATCH 10/10] Enquote JSON strings --- proposals/3592-pdf-markup.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/proposals/3592-pdf-markup.md b/proposals/3592-pdf-markup.md index 29470f7af..0c7cb9561 100644 --- a/proposals/3592-pdf-markup.md +++ b/proposals/3592-pdf-markup.md @@ -49,11 +49,11 @@ the front matter of a book). Text annotations will be represented within an `m.markup.location` as follows: ``` -m.markup.location: { - m.markup.pdf.text: { - rect: {left: ..., right: ..., top: ..., bottom: ...} - contents: ... - page_index: ... +"m.markup.location": { + "m.markup.pdf.text": { + "rect": {"left": ..., "right": ..., "top": ..., "bottom": ...} + "contents": ... + "page_index": ... } .. } @@ -73,12 +73,12 @@ Highlight Annotations will be represented within an `m.markup.location` as follows: ``` -m.markup.location: { - m.markup.pdf.highlight: { - rect: {left: ..., right: ..., top: ..., bottom: ...} - contents: ..., - quad_points: [...], - page_index: ... +"m.markup.location": { + "m.markup.pdf.highlight": { + "rect": {"left": ..., "right": ..., "top": ..., "bottom": ...} + "contents": ..., + "quad_points": [...], + "page_index": ... } .. }