From eca07f6a53b144af6f86b5db622d4d3b989c5e53 Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Sun, 17 Apr 2022 11:26:52 -0500 Subject: [PATCH 1/5] Audiovisual media markup, initial commit --- .../XXXX-markup-locations-for-audiovisual-media.md | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 proposals/XXXX-markup-locations-for-audiovisual-media.md diff --git a/proposals/XXXX-markup-locations-for-audiovisual-media.md b/proposals/XXXX-markup-locations-for-audiovisual-media.md new file mode 100644 index 000000000..8f58730f3 --- /dev/null +++ b/proposals/XXXX-markup-locations-for-audiovisual-media.md @@ -0,0 +1,10 @@ +# Markup Locations for Audiovisual Media + +[MSC3574](https://github.com/matrix-org/matrix-spec-proposals/pull/3574) +proposes a mechanism for marking up resources (webpages, documents, videos, and +other files) using Matrix. The proposed mechanism requires an +`m.markup.location` schema for representing the location of annotations within +different kinds of resources. MSC3574 punts on what standard location types +might be available, deferring that large family of questions to other MSCs. +This MSC aims to provide basic location types for marking up audiovisual media +resources. From 2922b22a347f303a8853e983a5e859f3db80d161 Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Sun, 17 Apr 2022 11:29:57 -0500 Subject: [PATCH 2/5] update name with MSC number --- ...al-media.md => 3775-markup-locations-for-audiovisual-media.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename proposals/{XXXX-markup-locations-for-audiovisual-media.md => 3775-markup-locations-for-audiovisual-media.md} (100%) diff --git a/proposals/XXXX-markup-locations-for-audiovisual-media.md b/proposals/3775-markup-locations-for-audiovisual-media.md similarity index 100% rename from proposals/XXXX-markup-locations-for-audiovisual-media.md rename to proposals/3775-markup-locations-for-audiovisual-media.md From 67b0f84c9bf569a04ddba6e652161b46882f22d7 Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Mon, 25 Apr 2022 12:51:11 -0500 Subject: [PATCH 3/5] Write basic proposal --- ...-markup-locations-for-audiovisual-media.md | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/proposals/3775-markup-locations-for-audiovisual-media.md b/proposals/3775-markup-locations-for-audiovisual-media.md index 8f58730f3..2004bc86b 100644 --- a/proposals/3775-markup-locations-for-audiovisual-media.md +++ b/proposals/3775-markup-locations-for-audiovisual-media.md @@ -8,3 +8,33 @@ different kinds of resources. MSC3574 punts on what standard location types might be available, deferring that large family of questions to other MSCs. This MSC aims to provide basic location types for marking up audiovisual media resources. + +## Proposal + +Basic markup locations for audiovisual media should make use of the [Media +Fragments URI specification](https://www.w3.org/TR/media-frags/). The media +fragment specification is quite simple, and results in annotations compatible +with the w3c's [web annotation data +model](https://www.w3.org/TR/annotation-model/). + +Basic markup locations for audiovisual media should be applicable to `audio/*`, +`video/*` and `image/*` media types. + +The basic media fragment specification addresses content along two dimensions: +temporal (in the form of a time interval), and spatial (in the form of a +rectangle of pixels in the original media). The specification also includes +support for addressing media by track (in the case of audio with multiple +parallel media streams, for example a secondary dubbed English-language audio +stream). We only make use of the temporal and spatial dimensions in this MSC. + +Temporal locations consist of half-open intervals, specifying the first moment +included the location, and the first moment not included in the location. This +MSC will use milliseconds to represent moments. The Media Fragments URI +specification uses seconds with a decimal part. Spatial locations consist of +rectangular selections given by the x,y coordinates of the upper left corner +and the width and height of the rectangle. The coordinates and dimensions of +the rectangle can be indicated either with integers representing pixels (with +0,0 representing the top left corner of the image), or with integers +representing percentages of the width and height of the media. Temporal and +spatial locations can be combined to select spatio-temporal segments of video +recordings. From 03b6ede8040eebb5d7d63965ad38c0677d137ab4 Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Wed, 18 May 2022 14:54:04 -0500 Subject: [PATCH 4/5] Add location format --- ...-markup-locations-for-audiovisual-media.md | 69 +++++++++++++++++-- 1 file changed, 65 insertions(+), 4 deletions(-) diff --git a/proposals/3775-markup-locations-for-audiovisual-media.md b/proposals/3775-markup-locations-for-audiovisual-media.md index 2004bc86b..7432b129a 100644 --- a/proposals/3775-markup-locations-for-audiovisual-media.md +++ b/proposals/3775-markup-locations-for-audiovisual-media.md @@ -34,7 +34,68 @@ specification uses seconds with a decimal part. Spatial locations consist of rectangular selections given by the x,y coordinates of the upper left corner and the width and height of the rectangle. The coordinates and dimensions of the rectangle can be indicated either with integers representing pixels (with -0,0 representing the top left corner of the image), or with integers -representing percentages of the width and height of the media. Temporal and -spatial locations can be combined to select spatio-temporal segments of video -recordings. +0,0 representing the top left corner of the image or video), or with integers +representing percentages of the width and height of the media. This MSC will +represent percentages with integers in [0,1000000], allowing for four decimal +points of accuracy. Temporal and spatial locations can be combined to select +spatio-temporal segments of video recordings. + +### Media Fragments + +Media Fragments will be represented as follows: + +``` +m.markup.location: { + m.markup.media.fragment: { + start: .. + end: .. + x: .. + y: .. + w: .. + h: .. + } + .. +} +``` + +or (when spatial dimensions are given in percentages) as + +``` +m.markup.location: { + m.markup.media.fragment: { + start: .. + end: .. + xp: .. + yp: .. + wp: .. + hp: .. + } + .. +} +``` + +with all fields optional, but with the requirement that at least one field is +present, and that if any of `xywh` are present, then all are. + +The `start` and `end` values should be non-negative integers with `start < +end`, where `start` indicates the first millisecond of media included in the +location, and `end` indicates the first millisecond of media not included. If +`start` is omitted, the location begins at zero, and if `end` is omitted, the +location includes the whole duration of the media. + +The `xywh` fields should be non-negative integers describing a spatial region +within the media in pixel coordinates as described above. So `xy` should be +smaller than then [intrinsic height and width of the +video](https://html.spec.whatwg.org/multipage/media.html#concept-video-intrinsic-width) +respectively, and `wh` should be smaller than the difference between `x` and +the intrinsic width, and the difference between `y` and the intrinsic height, +respectively. In cases where the exception on `wh` is violated, the region +described should be clipped at the edges of the media. In the case where the +expectation on `xy` is violated, the location should be ignored as invalid. + +The `xp` `yp` `wp` and `hp` fields should be non-negative integers less than or +equal to 1000000, giving a spatial region within the media in percentage +coordinates as described above. If either `xp` + `wp` or `yp` + `hp` is greater +than 1000000, then the location should be ignored as invalid. + + From f460542daab05132d18317f30ba02513d867c9f2 Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Wed, 18 May 2022 15:09:40 -0500 Subject: [PATCH 5/5] Add WADM serialization --- ...-markup-locations-for-audiovisual-media.md | 104 +++++++++++++++--- 1 file changed, 88 insertions(+), 16 deletions(-) diff --git a/proposals/3775-markup-locations-for-audiovisual-media.md b/proposals/3775-markup-locations-for-audiovisual-media.md index 7432b129a..5effba264 100644 --- a/proposals/3775-markup-locations-for-audiovisual-media.md +++ b/proposals/3775-markup-locations-for-audiovisual-media.md @@ -45,14 +45,14 @@ spatio-temporal segments of video recordings. Media Fragments will be represented as follows: ``` -m.markup.location: { - m.markup.media.fragment: { - start: .. - end: .. - x: .. - y: .. - w: .. - h: .. +"m.markup.location": { + "m.markup.media.fragment": { + "start": .. + "end": .. + "x": .. + "y": .. + "w": .. + "h": .. } .. } @@ -61,14 +61,14 @@ m.markup.location: { or (when spatial dimensions are given in percentages) as ``` -m.markup.location: { - m.markup.media.fragment: { - start: .. - end: .. - xp: .. - yp: .. - wp: .. - hp: .. +"m.markup.location": { + "m.markup.media.fragment": { + "start": .. + "end": .. + "xp": .. + "yp": .. + "wp": .. + "hp": .. } .. } @@ -98,4 +98,76 @@ equal to 1000000, giving a spatial region within the media in percentage coordinates as described above. If either `xp` + `wp` or `yp` + `hp` is greater than 1000000, then the location should be ignored as invalid. +### Web Annotation Data Model Serialization +[MSC3574](https://github.com/matrix-org/matrix-spec-proposals/pull/3574) +includes a scheme for serializing matrix markup events as web annotations in +the web annotation data model. The scheme requires each markup location type to +have a canonical serialization as [a web annotation +selector](https://www.w3.org/TR/annotation-model/#selectors]). In this section, +we describe how to serialize `m.markup.media.fragment` as a WADM selector. + +We take advantage of the WADM's support for URI fragments as locations, using +the [FragmentSelector](https://www.w3.org/TR/annotation-model/#text-quote-selector) +selector. + +This allows us to encode a location of the form + +``` +"m.markup.media.fragment": { + "start": $START + "end": $END + "x": $X + "y": $Y + "w": $W + "h": $H +} +``` + +as a selector + +``` +{ + "type": "FragmentSelector", + "conformsTo": "http://www.w3.org/TR/media-frags/", + "value": "t=($START/1000),($END/1000)&xywh=$X,$Y,$W,$H" +} +``` + +and + +``` +"m.markup.media.fragment": { + "start": $START + "end": $END + "xp": $X + "yp": $Y + "wp": $W + "hp": $H +} +``` + +as + +``` +{ + "type": "FragmentSelector", + "conformsTo": "http://www.w3.org/TR/media-frags/", + "value": "t=($START/1000),($END/1000)&xywh=percent:($X/1000),($Y/1000),($W/1000),($H/1000)" +} +``` + +## Security considerations + +Because room state is unencrypted, `m.space.child` events conveying locations +via `m.markup.media.fragment` could leak information about the duration and +dimensions of a piece of media. This is part of a more general problem with +state events potentially leaking information, and deserves a general +resolution, a la [MSC3414](https://github.com/matrix-org/matrix-spec-proposals/pull/3414) + + +## Unstable prefix + +| Proposed Final Identifier | Purpose | Development Identifier | +| ------------------------- | ---------------------------------------------------------- | --------------------------------------------- | +| `m.markup.media.fragment` | key in `m.markup.location` | `com.open-tower.msc3775.markup.media.fragment`|