Merge 153a665a34 into e9f0f31d27
commit
c0f1765563
@ -0,0 +1,129 @@
|
|||||||
|
# Markup locations for PDF documents
|
||||||
|
|
||||||
|
[MSC3574](https://github.com/matrix-org/matrix-spec-proposals/pull/3574)
|
||||||
|
proposes a mechanism for marking up resources (webpages, documents, videos, and
|
||||||
|
other files) using Matrix. The proposed mechanism requires an
|
||||||
|
`m.markup.location` schema for representing the location of annotations within
|
||||||
|
different kinds of resources. MSC3574 punts on what standard location types
|
||||||
|
might be available, deferring that large family of questions to other MSCs.
|
||||||
|
This MSC aims to provide two basic location types for marking up PDFs.
|
||||||
|
|
||||||
|
## Proposal
|
||||||
|
|
||||||
|
Markup locations for PDFs should approximately follow the format of embedded
|
||||||
|
annotations provided in the PDF standard, for more straightforward integration
|
||||||
|
with PDF rendering and editing libraries that clients may wish to make use of.
|
||||||
|
|
||||||
|
The PDF standard includes many different kinds of annotations: 19 in PDF 1.4
|
||||||
|
(see [p499 here](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf))
|
||||||
|
and 26 in PDF 1.7, (see [p390 here](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf)).
|
||||||
|
This proposal introduces events for two of these kinds of annotations: *Text
|
||||||
|
Annotations*, which represent "sticky notes" at a certain point in the text,
|
||||||
|
and *Highlights*,which represent a certain range of text that should be highlighted.
|
||||||
|
|
||||||
|
PDF annotations all accept a very large set of different attributes. Of
|
||||||
|
these, only two are mandatory: `Subtype` and `Rect`, where `Subtype` gives the
|
||||||
|
annotation type, and `Rect` gives the position of the annotation on the PDF
|
||||||
|
page as a rectangle represented by an array of the form
|
||||||
|
|
||||||
|
[lower-left-x, lower-left-y, upper-right-x, upper-right-y]
|
||||||
|
|
||||||
|
where each item is a number of "user space units" (72ths of an inch) from the
|
||||||
|
bottom left corner of the page, sometimes called *points*.
|
||||||
|
|
||||||
|
This MSC does not propose to include any of the optional attributes. The
|
||||||
|
`Subtype` attribute will be indicated by a key of the `m.markup.location`
|
||||||
|
object. So only `Rect`, and the attributes specific to each annotation type,
|
||||||
|
need to be provided for.
|
||||||
|
|
||||||
|
Within a PDF, an annotation occurs as part of the content stream associated
|
||||||
|
with a particular page, so the page number doesn't need to be represented as
|
||||||
|
an attribute of the annotation. Since this information is not automatically
|
||||||
|
available in the Matrix context, `m.markup` locations for PDFs will also
|
||||||
|
require a *page index* field. The page index is a non-negative integer, and
|
||||||
|
is distinct from a *page label*, which is a string (for example "iv" within
|
||||||
|
the front matter of a book).
|
||||||
|
|
||||||
|
### Text Annotations
|
||||||
|
|
||||||
|
Text annotations will be represented within an `m.markup.location` as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
"m.markup.location": {
|
||||||
|
"m.markup.pdf.text": {
|
||||||
|
"rect": {"left": ..., "right": ..., "top": ..., "bottom": ...}
|
||||||
|
"contents": ...
|
||||||
|
"page_index": ...
|
||||||
|
}
|
||||||
|
..
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `contents` is a string indicating text for the text annotation. Precisely
|
||||||
|
how to set it will be left as an implementation detail for clients.
|
||||||
|
|
||||||
|
Optionally, `m.markup.pdf.text` may also contain a `name` value, which should
|
||||||
|
be a string that names an icon to be used in displaying the annotation.
|
||||||
|
Standardly recognized values are: "Comment", "Key", "Note", "Help",
|
||||||
|
"NewParagraph", "Paragraph" and "Insert".
|
||||||
|
|
||||||
|
### Highlight Annotations
|
||||||
|
|
||||||
|
Highlight Annotations will be represented within an `m.markup.location` as
|
||||||
|
follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
"m.markup.location": {
|
||||||
|
"m.markup.pdf.highlight": {
|
||||||
|
"rect": {"left": ..., "right": ..., "top": ..., "bottom": ...}
|
||||||
|
"contents": ...,
|
||||||
|
"quad_points": [...],
|
||||||
|
"page_index": ...
|
||||||
|
}
|
||||||
|
..
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `contents` are as above. `quad_points` is an array of arrays of the form:
|
||||||
|
|
||||||
|
[x_1,y_1,x_2,y_2,x_3,y_3,x_4,y_4]
|
||||||
|
|
||||||
|
each of which represents the vertices (in counterclockwise order) of an
|
||||||
|
oriented quadrilateral region of the PDF page. Each quadrilateral is meant to
|
||||||
|
encompass a word or group of contiguous words in the highlighted text.
|
||||||
|
|
||||||
|
Optionally, the `m.markup.pdf.highlight` may also include a `text_content` value,
|
||||||
|
which should be a string containing the highlighted text. The `text_content`
|
||||||
|
value is not part of the PDF standard, but is included as a convenience for
|
||||||
|
clients.
|
||||||
|
|
||||||
|
## Alternatives
|
||||||
|
|
||||||
|
Rather than accepting this MSC, we could wait for a more comprehensive MSC that
|
||||||
|
tries to comprehensively specify a complete set of location types on PDFs.
|
||||||
|
However, it seems best to work iteratively, and start with the pdf location
|
||||||
|
types that can most easily be implemented, rather than waiting until something
|
||||||
|
truly comprehensive can be implemented.
|
||||||
|
|
||||||
|
Rather than using userspace units, we could use some more fine-grained
|
||||||
|
coordinate system, for example milli-units. The PDF standard lets units
|
||||||
|
take on "real number values" so precision greater than one unit is possible.
|
||||||
|
But since we can't have float values in matrix events, we can't capture this
|
||||||
|
greater precision on the present proposal. However, this would probably create
|
||||||
|
confusion, and precision greater than 1/72th of an inch is probably excessive.
|
||||||
|
|
||||||
|
## Security considerations
|
||||||
|
|
||||||
|
Because room state is unencrypted, `m.space.child` events conveying locations
|
||||||
|
via `m.markup.location.highlight` could leak information about an encrypted
|
||||||
|
resource text through the `text_contents` field, or about the annotation itself
|
||||||
|
through the `contents` field. This is part of a more general problem with state
|
||||||
|
events potentially leaking information, and deserves a general resolution, a la
|
||||||
|
[MSC3414](https://github.com/matrix-org/matrix-spec-proposals/pull/3414)
|
||||||
|
|
||||||
|
## Unstable prefix
|
||||||
|
|
||||||
|
| Proposed Final Identifier | Purpose | Development Identifier |
|
||||||
|
| ------------------------- | ---------------------------------------------------------- | --------------------------------------------- |
|
||||||
|
| `m.markup.pdf.text` | key in `m.markup.location` | `com.open-tower.msc3592.markup.pdf.text` |
|
||||||
|
| `m.markup.pdf.highlight` | key in `m.markup.location` | `com.open-tower.msc3592.markup.pdf.highlight` |
|
||||||
Loading…
Reference in New Issue