Skip to content

Meeting Notes

Kim Leaman edited this page Feb 8, 2022 · 10 revisions

Meeting Notes

2022-02-01

Cliff

  • Setting ground work with python classes and will soon be able to process large quantities of pages to see what we get
  • Created github project: https://github.com/pulibrary/finding_aid_enrichment
    • Information space for this project
    • Add standup notes here? Yes.
  • Models under each use case
    • Currently Issues - Move into wiki?
    • Actively engage with ADAPT to edit and grow to understand what this is for
  • First ticket in progress is plugging several existing technologies together.
    • SpaCy: Where do strings occur and where do named entities occur?
      • Python data structures

Alexis

  • Adding Will Clements to this Project
    • Will will be invited to future meetings
    • Both will review the github repository as he will probably work on some of this and is interested in Data Problems
    • Alexis will talk to will this week and will loop him in next week. They will review these.

Esmé

  • Named entity recognition and library data - CDH folks interested.
  • Esme will schedule to meet with student to get general sense as to what they want.
    • Esme: Needs to figure out how to extract corpus from figgy that they could work with. Plain text or HOCR, etc…
    • Cliff: would like to be a part of these meetings and potentially direct them to papers of princeton or blue mountain project if appropriate
    • Cliff: Would not direct them to the dirty ocr
    • Kim: Student internship program project to review results/work with rough ocr?
    • Alexis: Might be worth investigating

Questions

  • Esme: Has questions about the lower set of tickets but the upper two are fine.
    • Cliff: the lower set may well be out of scope/in need of editing.
  • Alexis: Is there anything that she can read that she can better understand named entity recognition and what it can/cannot do.
    • Cliff will compile a list of reading
  • Alexis: is it ok to adjust or change things in the wiki/github issues?
    • Cliff: yes, absolutely it is expected and very much appreciated. Dialogue between Alexis’s group and IT as tech to think through essential.

Action Items

  • Alexis inviting Will to future meetings
  • Esmé reaching out to studentand setting up time for Student/Esmé/Cliff to meet (issue #4)
  • Cliff compiling a list of reading material for Alexis (issue #5)
  • Kim adding notes to github project wiki (done)

2022-02-08

Present

  • Cliff Wulfman, Esmé Cowles, Alexis, Antracoli, Will Clements, Kim Leaman (notes)

Esmé and Cliff

  • Met with Student, and recommended the Daily Princetonian Archive to be used for the student’s project.

Alexis

  • added a hypothesis that is not entirely clear, but worth discussing
    • While doing this work and who is named/what name is in each component, can that lead to manual work between people or places,etc…
      • Esme: yes, and of particular interest would be people who crosswalk between collections

Cliff

  • Started page using named entity recognition and links to what it is and how we may be using it for this particular project
  • Working on constructing a robust architecture for taking a iiif manifest, pulling down images associated with each page, cleaning and running ocr on it and initial named entity pass. Collecting those up at a page at a time and into a collection
    • Wants to be able to track where names occur- what page, etc.
    • Researching how to integrate into iif….RDF - annotations on a canvas?
    • Build up a graph of all the strings and locating where they are on each canvas.
    • Examine what those strings are, where they occur, etc…
    • Take a container/folder or two from the selected materials and see what is coming out of it.

Esmé

  • Cliff, can you share with everyone on slack what your results are before the next meeting? It would be a good step for folks to review and discuss next week
    • Yes

Action Items

  • Cliff: will share results with folks in Slack to review before meeting next Tuesday
  • Alexis and Will: Will take a look at the Named Entity link(s) Cliff has shared.
  • Alexis, Will, Esmé, Kim: Will take a look at what cliff shares in slack.
  • Kim: add notes to Github

2022-02-15

Clone this wiki locally