Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dwc:GeologicalContext: Chronostratigraphy vocabulary - curation before uploading first vocabulary version #121

Open
CecSve opened this issue Jan 30, 2023 · 11 comments
Labels
content Label for issue concerning vocabulary content occurrence

Comments

@CecSve
Copy link
Collaborator

CecSve commented Jan 30, 2023

          dwc:GeologicalContext terms: https://github.com/gbif/pipelines/issues/400

Originally posted by @CecSve in #120 (comment)

A Chronostratigraphy vocabulary would cover concepts across multiple terms in the dwc:GeologicalContext category (gbif/pipelines#400 (comment)):

earliestEonOrLowestEonothem
latestEonOrHighestEonothem
earliestEraOrLowestErathem
latestEraOrHighestErathem
earliestPeriodOrLowestSystem
latestPeriodOrHighestSystem
earliestEpochOrLowestSeries
latestEpochOrHighestSeries
earliestAgeOrLowestStage
latestAgeOrHighestStage

The vocabulary follows the vocabulary published by the CGI Geoscience Terminology Working Group hosted by the International Commission on Stratigraphy (ICS) (https://vocabs.ardc.edu.au/viewById/196, gbif/pipelines#400 (comment), https://github.com/CSIRO-enviro-informatics/interactive-geological-timescale/blob/master/src/assets/timeline_data.json, https://stratigraphy.org/timescale/).

Here is a file to edit: https://docs.google.com/spreadsheets/d/1k3YpAeRT3HxR9DBnkh0jkZZl12jimkHU3_H_pCPOUHc/edit?usp=sharing

https://docs.google.com/spreadsheets/d/1aHqhhtO93nooQ0o4AAVcSBVpyb-IGUXu9dZVTiN77TY/edit#gid=694447980 (updated version that supports numerical ranges for the time scales - version to be implemented)

It contains:

  • Summary
  • Concepts: the list of existing concepts
  • a list of the values already mapped to the concepts (they are all in the Hidden sheet/tab for now)
  • Verbatim values: the GBIF verbatim values for this field that appear more than 10,000 times or in 5 or more datasets this tab contains all verbatim values for the relevant terms from here: https://github.com/tdwg/dwc-qa/tree/master/data/GBIFDistinctValues/2022-03-08
  • Suggested definitions: definitions of concepts and source links

Please check instructions here: #70

@CecSve CecSve added the content Label for issue concerning vocabulary content label Jan 30, 2023
@CecSve
Copy link
Collaborator Author

CecSve commented Feb 3, 2023

  • Add English labels to the concepts.
  • When needed, move the mapped values from the Verbatim sheet to the Concept sheet as an alternative label but do not add any new concept.
  • Map as many verbatim values as possible.
  • Verbatim values that are identical to either concepts or alternative labels should be deleted.

@CecSve
Copy link
Collaborator Author

CecSve commented Mar 6, 2023

I will setup all the verbatim field tabs tomorrow and let you know when it is ready.

@CecSve
Copy link
Collaborator Author

CecSve commented Mar 7, 2023

I have now set up 10 tabs for each field related to chronostratigraphy. Duplicate/identical values have been removed, although the same value may appear in e.g. "", () or similar - please map these to concepts as well although they appear to be duplicates.

If a value does not belong to any of the concepts, please leave it unmapped.

You may also want to take a look at the suggested definitions tab where you can fill out definitions and descriptions for the concepts (including time period) according to authoritative sources.

@ekrimmel - I have heard you also have a Slack channel assigned for this work. Feel free to add me if you find it useful for me to be part of it.

@CecSve
Copy link
Collaborator Author

CecSve commented Apr 24, 2024

Following meetings with the Paleo Working Group in CPH this week, we have decided that we want one search term for stratigraphy (all 10 dwc fields), 1 search term for lithistratigraphy (combining 4 dwc fields) and 1 field for biostratigraphy (combining two dwc fields).

So we will reduce 16 dwc fields to 3 in searches - see this issue: gbif/gbif-web#497.

Now, how should I set up the vocabular(y/ies) on the vocabulary server for this?

  • Would I still have to make 10 vocabularies for all 10 stratigraphy fields so the hidden values are mapped correctly?
  • Could I have 1 vocabulary with
  1. Concept
  2. Rank
  3. Range
    and then the hidden value mapping would be somewhere else? As far as I understand, we would use the rank and ranges to assign the correct concept during interpretation, since the dwc fields are rank specific.

@RogerBurkhalter
Copy link

Again, I strongly support this. Question, does "Range" refer to text or numeric values? Numeric values are more precise, but a moving target. If using IUGS values, use only the ratified values and not numbers (or text) harvested from issues of "Episodes" where values are not finalized. I've seen some wild ones recently.

@CecSve
Copy link
Collaborator Author

CecSve commented Apr 24, 2024

Again, I strongly support this. Question, does "Range" refer to text or numeric values? Numeric values are more precise, but a moving target. If using IUGS values, use only the ratified values and not numbers (or text) harvested from issues of "Episodes" where values are not finalized. I've seen some wild ones recently.

The plan is to use the numerical age from the most recent ICS source: https://stratigraphy.org/ICSchart/ChronostratChart2023-09.pdf. I do not see any mention of IUGS values, but I do see this specification:

Numerical ages are subject to revision and do not define units in the
Phanerozoic and the Ediacaran; only GSSPs do. For boundaries in the
Phanerozoic without ratified GSSPs or without constrained numerical
ages, an approximate numerical age (~) is provided.

Would you then advice GBIF not to use the uncertain ages (~)? @ekrimmel and others, we did not discuss this, but you may want to chime in?

Just to be clear - the numerical ages would be used to structure data in the back end to enable more dynamic searches on paleo data. What users would see and search for would most likely be the concepts themselves.

@CecSve
Copy link
Collaborator Author

CecSve commented Jun 4, 2024

The vocabulary concepts are now uploaded to UAT and PROD.

@MortenHofft this was what you needed for the hosted portal, right?

Now we just need to add the hidden value mappings when they are ready.

@ekrimmel
Copy link

ekrimmel commented Jun 4, 2024

We are working on this again! Sorry for the long delays between action :)

@CecSve
Copy link
Collaborator Author

CecSve commented Jun 6, 2024

No worries - thank you for dealing with the mappings and let me know if you have any questions for the rest of them.

@CecSve
Copy link
Collaborator Author

CecSve commented Jun 12, 2024

The tags were missing from the previous upload so the vocabulary has now been uploaded again to UAT and PROD so the age period of the concept is showing in the tags (uncertain age periods are not included).

@CecSve
Copy link
Collaborator Author

CecSve commented Aug 29, 2024

We now have the potential flags and issues included. They still require proper documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Label for issue concerning vocabulary content occurrence
Projects
None yet
Development

No branches or pull requests

3 participants