-
Notifications
You must be signed in to change notification settings - Fork 0
Meeting Notes
Kim Leaman edited this page Feb 8, 2022
·
10 revisions
Cliff
- Setting ground work with python classes and will soon be able to process large quantities of pages to see what we get
- Created github project: https://github.com/pulibrary/finding_aid_enrichment
-
- Information space for this project
-
- Add standup notes here? Yes.
- Models under each use case
-
- Currently Issues - Move into wiki?
-
- Actively engage with ADAPT to edit and grow to understand what this is for
- First ticket in progress is plugging several existing technologies together.
-
- SpaCy: Where do strings occur and where do named entities occur?
-
-
- Python data structures
-
Alexis
- Adding Will Clements to this Project
-
- Will will be invited to future meetings
-
- Both will review the github repository as he will probably work on some of this and is interested in Data Problems
-
- Alexis will talk to will this week and will loop him in next week. They will review these.
Esmé
- Named entity recognition and library data - CDH folks interested.
- Esme will schedule to meet with student to get general sense as to what they want.
-
- Esme: Needs to figure out how to extract corpus from figgy that they could work with. Plain text or HOCR, etc…
-
- Cliff: would like to be a part of these meetings and potentially direct them to papers of princeton or blue mountain project if appropriate
-
- Cliff: Would not direct them to the dirty ocr
-
- Kim: Student internship program project to review results/work with rough ocr?
-
- Alexis: Might be worth investigating
Questions
- Esme: Has questions about the lower set of tickets but the upper two are fine.
-
- Cliff: the lower set may well be out of scope/in need of editing.
- Alexis: Is there anything that she can read that she can better understand named entity recognition and what it can/cannot do.
-
- Cliff will compile a list of reading
- Alexis: is it ok to adjust or change things in the wiki/github issues?
-
- Cliff: yes, absolutely it is expected and very much appreciated. Dialogue between Alexis’s group and IT as tech to think through essential.
Action Items
- Alexis inviting Will to future meetings
- Esmé reaching out to studentand setting up time for Student/Esmé/Cliff to meet (issue #4)
- Cliff compiling a list of reading material for Alexis (issue #5)
- Kim adding notes to github project wiki (done)
Present
- Cliff Wulfman, Esmé Cowles, Alexis, Antracoli, Will Clements, Kim Leaman (notes)
Esmé and Cliff
- Met with Student, and recommended the Daily Princetonian Archive to be used for the student’s project.
Alexis
- added a hypothesis that is not entirely clear, but worth discussing
-
- While doing this work and who is named/what name is in each component, can that lead to manual work between people or places,etc…
-
-
- Esme: yes, and of particular interest would be people who crosswalk between collections
-
Cliff
- Started page using named entity recognition and links to what it is and how we may be using it for this particular project
- Working on constructing a robust architecture for taking a iiif manifest, pulling down images associated with each page, cleaning and running ocr on it and initial named entity pass. Collecting those up at a page at a time and into a collection
-
- Wants to be able to track where names occur- what page, etc.
-
- Researching how to integrate into iif….RDF - annotations on a canvas?
-
- Build up a graph of all the strings and locating where they are on each canvas.
-
- Examine what those strings are, where they occur, etc…
-
- Take a container/folder or two from the selected materials and see what is coming out of it.
Esmé
- Cliff, can you share with everyone on slack what your results are before the next meeting? It would be a good step for folks to review and discuss next week
-
- Yes
Action Items
- Cliff: will share results with folks in Slack to review before meeting next Tuesday
- Alexis and Will: Will take a look at the Named Entity link(s) Cliff has shared.
- Alexis, Will, Esmé, Kim: Will take a look at what cliff shares in slack.
- Kim: add notes to Github