Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start work on goal 2: identify and consolidate re-used and re-usable metadata elements #4

Open
atn38 opened this issue Mar 29, 2022 · 0 comments

Comments

@atn38
Copy link
Owner

atn38 commented Mar 29, 2022

goal 2 of this project is to help people import their EML corpus into a relational database system. the output will likely work best with LTER-core-metabase, but ultimately it's the user choice. to that end, re-used metadata elements need to be identified and consolidated into lookup tables for import into database later.

we will need to identify:

  • identical re-use e.g. people whose name across EML files are consistent.
  • close but not identical re-use e.g. the same person whose name differ a bit across EML files. Look into OpenRefine and taxonomyCleanr for possible matching solutions.

in these EML elements:

  • missing codes and categorical codes
  • contributing parties: creator, associated parties, metadata providers, contact, and their ID
  • geocoverage or sites
  • keywords and keyword thesauri
  • protocols
  • taxa and taxa providers
  • publications
  • annotations
  • boilerplate elements: project, project personnel, license, funding info, etc

we will need to sort some of those into different priorities

@atn38 atn38 moved this to Todo in pkEML goal 1 Apr 7, 2022
@atn38 atn38 moved this from Todo to In Progress in pkEML goal 1 Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

1 participant