-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expressing Locator ranges for highlights #95
Comments
The electron/desktop navigator currently "augments" the Locator data structure, not by extending it (strictly-speaking), but via composition: The A word about the "visual mapping" technique: right now, in order to "turn to the correct page / scroll to the corresponding offset" when a |
Follow-up: because |
I've seen the Here is the relevant part in the doc where I mentioned this: https://docs.google.com/document/d/1W_kbSRve7c1ZtyYTZ-fbEOsFZPTqxN0IV7UdvH_h20U/edit#bookmark=id.jc195922qtyf If we decide to implement a per-format custom model like In my opinion, |
I agree with you :) |
Regarding RangeInfo: although I am personally satisfied with the ad-hoc JSON-friendly serialisation format used in the TypeScript code of the R2 desktop SDK (navigator), I would like to hear more from EvidentPoint / Juan who have explored alternative DOMRange marshalling solutions in their GlueJS experiments. Perhaps there are certain pros/cons we should carefully examinate before committing to a particular solution. |
Some additional thoughts on that:
This raises a concern. We want Locators to be useful on their own, if they ever become orphans (publication is updated or no longer available). We need to find the right balance between having an empty text highlight and respecting the restrictions expressed by rights holders. |
This sounds like the solution with the
Maybe a hard limit for the length of And/or that could be part of the LCP app compliance tests that the user can't export or copy the text of a highlight extracted from an LCP protected book. |
Fragments in the currently-proposed Locator model: W3C web annotations selectors: We know from implementation experience (currently in R2 SDK for desktop/Electron) that DOMRange can be reliably serialized into a text format / JSON-friendly data structure. This is a direct translation of DOMRange without the pitfalls of EPUB CFI (which can be seen as a useful interchange syntax, but not convenient / performant for the internals of R2 features). Basically, we cannot reference a DOMRange TextNode directly, so we encode the parent Element reference using a quasi-canonical / unique CSS Selector (as already used for Locators in R2 desktop/Electron), and we record the zero-based index of the child TextNode. Other than that, the data structure is exactly the same as the DOMRange object model (start, end, offset, etc.). So, one option is to "flatten" this tuple notation into a single string so that it fits within the current Locator Fragment proposal (array of strings, reflecting different resolutions applicable to different media types).
... instead of:
Thoughts? |
Then, we indeed have the problem of https://github.com/readium/architecture/blob/master/locators/README.md#the-locator-text-object |
It seems that we got a consensus yesterday on the highlights model, but before we close this issue, I feel I must challenge one of its hypothesis: "a range can span two documents (eg. FXL spreads) so we need two hrefs". The reasons why I'm challenging it is that if it was not a requirement, we could extend the expressivity of some fragment expressions and get ranges in a simpler way. Already, cfis, media fragments (especially time based, but also rectangles in a sense), html ids to a certain extent can represent a text range or an audiovisual segment. It would then suffice to extend css selection expressions and we would get ranges in a single Locator. It is true that Acrobat allows selecting text across PDF pages (but for Acrobat these may not be different "resources". But I tried iBooks and it does not allow selecting text across resources. I tried Aldiko iOS and could not select text across CSS columns. So, are we sure we are not creating a complex model with no real requirement? |
I think for EPUB it might really not be needed, since content will rarely be split between two resources (maybe for some FXL layed out like a PDF with split paragraphs). But more importantly, I don't think we can make the native selection work properly (or at all) between two webviews or even two iframes. The only way to make this work would have to either build the selection from scratch with overlays or have a convoluted UX to merge two highlights. For PDF we don't need it either since a single PDF document is one Personally I'd be more comfortable having a single In this context, I suggest again my initial proposal which is to have
|
@mmenu-mantano I like this last proposal but in such a case, what to do with fragments which can already represent a range, like "t=1,10"? Do we state that the must not be used and that we consider only fragments which represent a single "point" (in space of time)? And how do we justify that we don't take the same approach as Web Annotations selectors, in which a single fragment can represent a range? Note that even a css id is mapped to a "segment", not a single point in the text. Same for xpointer, xywh, even page for pdf. |
Yes usually range fragments can also express a discrete location. I think it makes sense to ignore ranges in a single Also, this problem is there too if we use a containing structure with two Maybe it doesn't answer all formats but if a fragment is mapping to a rectangle (eg. a PDF page) then the top-left (depending on reading progression I guess) is the discrete location in this case. I'm not familiar with the Web Annotations selectors, but if they offer a better solution I'm all for it, all the current solutions feel awkward in the |
If we ignore the case of a range expressed over two different resources, then there's no need to extend the
IMO this should be the beginning of the range, since that's where you jump to when you want to present a range. |
The Web Annotation model is very flexible, IMO a little too much for its own good. As I said in my previous comment, it defaults to a single "selector" (equivalent of our locations) using either: fragment, XPath, CSS or text. If the range can't be easily defined that way, it uses a specific "range" selector. If we ported that to our model, it would look like that: {
"href": "http://example.com/track6",
"type": "audio/ogg",
"title": "Chapter 5",
"locations": {
"progression": 0.607379,
"totalProgression": 0.50678,
"range": {
"start": {
"fragments": ["t=389.84"]
},
"end": {
"fragments": ["t=529.52"]
}
}
}
} I think this is not necessary in our case and I'd much rather rely on a single location to express range. |
That's fine with me, so we just need a range fragment for EPUB that we can use in R2 (CFI is out by consensus). Do you have any opinion on Daniel's comment Hadrien? #95 (comment) If we want to keep the semantics without changing too much the JSON model, we could also use the PDF fragment approach with a query string: |
EPUB CFI is indeed fit for purpose when it comes to expressing DOM position and range, as a standardized interchange format across systems. However, in my opinion (and based on implementation experience) generating and parsing CFI references is error-prone on the border-cases, and it is also woefully inneficient. Right now in the R2 desktop/Electron implementation an ad-hoc serialisation format is used to reliably and cheaply marshal DOMRange objects. Basically, a CSS Selector is combined with a child index, and a character offset (this 3-tuple is used for both the start and end positions). This is as close as it can possibly get to a native DOMRange, resulting in extremely efficient processing. It would be a shame to flatten this simple JSON-compatible data structure into a string "fragment" inside the Locator payload. I would much prefer transporting this information across the reading system layers (typically: app <--> navigator) in its raw form, rather than unnecessarily massaging it just to fit into a prescribed Locator format (which would require awkward escaping rules in order to cater for the fragment's own syntax, delimiters etc.). That is not to say that there is no value in the flattened "fragments", which are indeed naturally amenable to URI encoding (e.g. like the elegant Media Fragment t=0,99 for time ranges), and therefore desirable for interchange purposes. We can certainly produce interoperable position/range references a-la W3C selectors, whenever necessary (for example when such Locator payloads are exported into non-R2 systems, like an external annotation service). But for the subsystems involved in R2 reading systems (navigator, etc.) I believe that we should stay as close as possible to native web browser engine constructs like DOMRange and CSS Selector, which yield the best performance and readability (have you tried debugging CFI?). As for XPath, I guess you mean XPointer scheme which would offer similar capabilities to CFI, but also similar headaches. Thoughts? |
I think it's slightly better than the Web Annotation approach, but I'm not a big fan of having a controlled vocabulary for fragments. IMO fragments should be registered alongside a media type (which is usually the case) and we should be able to express potentially any fragment, not just a list of well-known fragments. Keep in mind that I'm mostly discussing the core model and its JSON serialization, what Daniel suggested IMO makes perfect sense in the context of a specific implementation for example. |
@danielweck Are you mentioning this for your internal I think we should be designing the format of the fragment that will be stored in the Locator JSON here (https://github.com/readium/architecture/tree/master/locators). Each platform can then implement helpers around the |
I do not understand the nuance of this statement. But let's take platform "desktop / Electron" as an example (bearing in mind that the same reasoning applies to any coherent reading system platform, like an iOS or Android app, or even a web app involving its own client/server code). The key subsystems that exchange Locator data back-and-forth The Locator object (herein discussed) transports this information across the navigator / app boundary in a standardized fashion, not only syntax-wise (typically in a serialised JSON form, due to technical constraints), but also in terms of the processing model (e.g. fragments that express different resolutions, precedence order). This works great for things like progression, position, CSS Selectors, time Media Fragments, which are quite consensual. But here we are now discussing the more complex use-case of precise character-level document range, for which CFI was a semi-successful attempt to offer a reliable consistent model, and XPointer scheme never took off either. In a platform-specific implementation context (e.g. the iOS, Android, Electron/desktop apps), why would a navigator ever bother creating and consuming CFI or other supplemental fragment types, when they have no concrete use for them? This is why in desktop/Electron we currently "extend" the Locator definition by means of composition (additional proprietary / ad-hoc data structure) in order to fill the gap where standard Locator is not satisfactory. Now, for "interchange" purposes across different system implementations ; e.g. a streamer server and multiple platform-specific clients ; the Locator payload obviously need to be standardized, just like any other part of the architecture. What I am suggesting is that in the real world, implementations will not bear the burden (performance impact, debuggability, readibility, etc.) of creating processors and executing their code (for example, producing CFI fragments when the receiving end is known to ignore them completely) unless there is a need for generating and consuming the proposed multi-resolution/granularity, "flattened" Locator fragment syntax ... thereby favouring instead the extended / proprietary data structure. Surely, we want to mitigate this? I can see the cons of incorporating a totally bespoke DOMRange structured serialisation format in the Locator model (which would be at odds with the dominant "linearized string" fragment approach), but I also want to make sure we examine the pros too. For me, the benefits in terms of performance, testability, readability etc. are not insignificant. I would hate to continue to use a totally different system alongside standard Locators (in the Electron/desktop implementation), when there is in fact an opportunity for us all to leverage the same structured syntax for selection/annotation ranges. Do you understand my viewpoint? (sorry for the long-winded explanation) |
@danielweck I think that we're having two separate discussions here:
IMO changing how fragments are represented has a lot of downsides and so far, only DOMRanges would benefit from this. DOMRanges can already be added to a As an alternative, we can also define an extensibility model for {
"href": "http://example.com/chapter1",
"type": "text/html",
"title": "Chapter 1",
"locations": {
"position": 4,
"progression": 0.03401,
"totalProgression": 0.01349,
"domrange": {
"selector": "div[3]",
"index": 3,
"offset": 25
}
}
}
I'm not buying the argument that CFI is completely going away. I don't think it's going to be that uncommon for organizations to run a mix of Web Apps that are not primarily developed by Readium (yet compatible with RWPM) with mobile/desktop apps that are built by Readium. |
Keeping the fragments as strings is really important in my opinion:
So adding a (standardized across R2 platforms) Locator extension for the DOM range could really be a solution. Especially since there's no standard fragment format to represent a DOM range and so this is likely a private Readium implementation. But keeping everything in Note that on mobile we will most likely not forward the full JSON locator to the JS layer, since it will only be interested in the fragments and/or the DOM range.
Definitely, especially to add additional data that should be encoded into the |
We really want to close this issue tomorrow, do you have any additional comment @danielweck ? |
I do actually :) In your example: {
"href": "http://example.com/chapter1",
"type": "text/html",
"title": "Chapter 1",
"locations": {
"position": 4,
"progression": 0.03401,
"totalProgression": 0.01349,
"domrange": {
"selector": "div[3]",
"index": 3,
"offset": 25
}
}
} ...I think we need to distinguish {
"href": "http://example.com/chapter1",
"type": "text/html",
"title": "Chapter 1",
"locations": {
"position": 4,
"progression": 0.03401,
"totalProgression": 0.01349,
"domrange": {
"start": {
"selector": "div[3]",
"index": 3,
"offset": 25
},
"end": {
"selector": "div[4]",
"index": 2,
"offset": 11
}
}
}
} |
Also, as the proposed |
I filed a separate issue to express a more general concern about the concept of Locator "fragments": #98 |
Actually, this is not my personal preference nor the only solution that I've listed in my previous comment. Overall, I think that the current approach of an array of strings is perfectly fine and works for what you've requested. That said, we could use an extensibility model for both the top level of a Locator and for Overall, I'm not in favor of extending our current model with anything too specialized, I think the mix of
We had something more specialized elements before (with dedicated elements for CFI and CSS Selectors among other things) and I think it would be a step back to go back in that direction. |
Progress update based on our latest conference call: Here let's discuss the specifics of Full example: {
"href": "http://example.com/chapter1",
"type": "text/html",
"title": "Chapter 1",
"locations": {
"progression": 0.8,
"fragments": [
"#paragraph-id",
"t=15"
],
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"partialCfi": "/4/2/8/6[paragraph-id]",
"domRange": {
"start": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"offset": 11
},
"end": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"offset": 22
}
}
}
} Partial example (so we can focus on property naming, and meaning): "domRange": {
"start": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"offset": 11
},
"end": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"offset": 22
}
} |
|
Note that in Also note that sometimes, mobile iOS/Android selection UX always normalize the original DOMRange start/end so that it never references DOM elements (always text nodes). This normalization step can also be explicitly added in desktop implementations (I am working on that, actually, based on the work of Apache annotator). See TypeScript prototype implementation which inspired this proposal: |
Thank you, Daniel. The naming is very clear. I have two questions/suggestions:
We really need to make sure that those locators are compatible when shared between R2 platforms. I think it's not such a problem if the output is different (not canonical) as long as when used from a different platform we end up at the exact same place in the DOM. |
|
Regarding highlights, we agreed on using a single "DOM-ranged" I suggest that we use it this way:
I don't see any use for keeping the raw text once the Additional processing could be done for
For non-protected books, I think truncating the |
|
Thinking about an alternative to avoid using
Make more logical sense, I think, and can still roundtrip from/to DOMRange without ambiguity. |
Yes, this would typically be a collapsed range. |
I have a preference for preserving "raw" text alongside "clean" text. This can be used to match actual text in the DOM (including line breaks, insignificant whitespaces, etc.), for example when reconstructing/correlating DOM text with spoken TTS utterances. |
Why? Could that not be useful for both the "collapse range" case (discrete position) and the "spanning start/end range" case? |
I'm talking specifically about the highlights for this case (eg. a bookmark would need to fill That being said, before/after could still be useful for highlights to add more context, especially if the highlighted part is only a few words.
There's nothing in the model right now that can hold raw text, but for specific use case we could expose the raw text directly in the navigator API related to TTS. Also, different API can produce different However, I think consistency might be more important. Maybe
Agreed, I think your new proposal is much clearer. One reservation: is there a practical difference between |
For all intents and purposes, no. However I remember seeing DOMRange pointing to TextNode without character offset ... I cannot remember how I came across this edge case, but I allowed 'null' / |
I think this makes sense. The "clean/normalize text" utility function should also be standardized (in the context of the R2 Locator model). |
What about a collapsed DOMRange though? I would like to be able to use |
Yes, especially if we consider that the Locator is really used to locate a precise position and not as a substitute for a On the Swift native model we can easily add a dynamic property to get the
I was talking about the text selection for highlights specifically. If a DOM range is collapsed, then it is actually not a selection so this API would return But for other single-location cases (eg. mouse click) then a collapsed DOMRange makes sense, and having On a side note, I'm not sure we need a |
To summarize the current consensus on this discussion: "locations": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"partialCfi": "/4/2/8/6[paragraph-id]",
"domRange": {
"start": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"charOffset": 11
},
"end": {
"cssSelector": "body.rootClass div:nth-child(2) p#paragraph-id",
"textNodeIndex": 0,
"charOffset": 22
}
}
} DOM Range
(@danielweck I removed the LocatorText
|
Specification moved to: https://github.com/readium/architecture/blob/master/models/locators/extensions/html.md |
In the context of building the Highlight Navigator API, we need a way to express a selection range with
Locators
.We can't use a single Locator (as the model is now) because:
Using two
Locators
is not great because the selection text (LocatorText
) is duplicated and it's not obvious which one should be used.A potential solution would be to have an additional
LocatorRange
object that contains twoLocators
with empty.text
and an additional "standalone"LocatorText
that represents the content of the range.The
.text
is optional because not necessary when passing theHighlight
to the navigator, and for protected book it would be empty (to avoid bypassing the copy rights).Any thoughts, alternative solutions?
The text was updated successfully, but these errors were encountered: