Skip to content

Annotations and Annotation Servers

Cliff Wulfman edited this page Apr 26, 2022 · 1 revision

Annotations and Annotation Servers

Automation can only take us so far in discovering under-represented names, of course. Eventually archivists will need to look at the generated list of names (or lexical items SpaCy thinks are names) and use their expertise to determine who they are and whether they meet the criteria for inclusion in this name set. To do so, archivists must be able to see these names in context: they need to read the archival pages.

As we know, Figgy uses the IIIF Presentation API to model text, and we have been using that API to retrieve representations of archival materials: we use the Presentation API to retrieve IIIF Manifests that represent archival containers with (digitized) pages in them. Although it is seldom used this way, the IIIF Presentation API was designed to support linked data: indeed, its authors go to considerable lengths to describe their rationale for using JSON-LD as a serialization format, though I strongly suspect none of the library applications that have adopted IIIF use it for anything other than as a back-end for image viewers. We, on the other hand, are exploiting the IIIF’s semantic representation of resources by loading manifests directly into our graph and then associating the text strings identified as names with the canvases on which they appear.

We have speculated that we could exploit IIIF’s support of the Web Annotation model to represent OCR results as IIIF Annotation objects that could be stored on an annotation server and then searched and browsed with a viewer that supports the IIIF Presentation API. A preliminary survey suggests, however, that current implementations do not support our use case very well.

Our use case is very simple:

  • Given some symbolic content, show me all the inscriptions in which it appears.
  • Given an inscription, show me the page on which it appears.

Annotation Servers

Few annotation servers have been developed to date; iiif-awesome lists six, but only one, Glen Robson’s Simple Annotation Server, seems to be in active development, and it is not terribly viable: it is difficult to install; it is buggy; and it is not compatible with Mirador 3. More importantly, our use case does not require most of SAS’s functionality, which is focused on enabling open, interactive annotation of resources with bounding boxes and textual notes.

IIIF Content Search APIs

https://iiif.io/api/search/1.0/ has not been revised since 2016, and it is not compatible with the latest version of the IIIF Presentation API (see https://iiif.io/community/groups/content-search-tsg/charter/ and https://github.com/IIIF/api/issues?q=is%3Aissue+is%3Aopen+label%3Asearch), though it appears a version 2 is under consideration.

Conclusions

Using IIIF protocols to find and display OCR results seems a promising approach, but there are no ready-to-hand implementations our project could exploit. Our work will provide use cases for future implementations.

Clone this wiki locally