Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative Score Affect of each Expected Component #6

Open
lomky opened this issue Jan 30, 2019 · 12 comments
Open

Relative Score Affect of each Expected Component #6

lomky opened this issue Jan 30, 2019 · 12 comments
Assignees
Labels

Comments

@lomky
Copy link
Collaborator

lomky commented Jan 30, 2019

This ticket is to talk through the individual cases to look for any exceptions to the general rule for how a missing component should affect the parent object's score.

By default, a 'required' should be ranked by the scoring script as a 0 if this doesn't exist.
By default, a missing 'optional' one won't affect the parent rank.
Do we give a bump to an object that has an optional component, above & beyond the score of that component (as in, bonus for including any add'tl prov, even if it's a low scoring one?)


Note: Some things are common among all publication types, and we can likely come to a general conclusion for them first and check each for individual exceptions. Namely:

  • publication
    • optional
      • activity (rare on most)
      • contributor
      • GCMD keywords
      • Regions

  • article:
    • optional:
      • activity (rare)
      • contributor
    • required:
      • journal
  • book:
    • optional:
      • activity (rare)
      • contributor
  • chapter:
    • required:
      • file
    • optional:
      • figure
      • finding
      • table
      • activity (rare)
      • contributor
  • dataset:
    • optional:
      • activity (rare)
      • contributor
  • figure:
    • optional:
      • activity (common)
      • contributors
      • chapter
    • required:
      • report
      • file
      • image
      • Contributor: Point of Contact
  • finding:
    • optional:
      • activity (common)
      • contributor
      • chapter
    • required
      • report
  • image:
    • optional:
      • activity (common)
      • contributor
    • required
      • figure
      • file
  • journal:
    • optional:
      • activity (rare)
      • contributor
    • required
      • Contributor: Publisher
  • report:
    • required:
      • file
    • optional:
      • figure
      • table
      • finding
      • chapter
      • activity (rare)
      • contributor
  • scenario:
    • optional:
      • activity (rare)
      • contributor
      • file
  • table:
    • optional:
      • activity (common)
      • contributor
      • chapter
    • required:
      • report
      • array
  • Array
    • optional:
      • activity (common)
  • webpage:
    • optional:
      • activity (rare)
      • contributor Host
      • contributor

Pass through:

  • contributor:*
    • optional:
      • person
      • organization
  • reference:
    • required:
      • (citing) publication(s)
      • (child) publication
@lomky
Copy link
Collaborator Author

lomky commented Jan 30, 2019

Contributor Note:

For every publication type, we should determine what are the required, allowed, and disallowed role types and determine the scoring for each.

USGCRP/gcis-conventions#31

All Roles:

 author                   | 18426
 point_of_contact         |  1938
 editor                   |   876
 contributing_author      |   422
 publisher                |   390
 contributor              |   373
 lead_author              |   349
 funding_agency           |   347
 distributor              |   244
 host                     |   136
 data_archive             |   107
 convening_lead_author    |    99
 scientist                |    84
 advisor                  |    75
 coordinating_lead_author |    64
 data_producer            |    57
 contributing_agency      |    56
 coordinator              |    33
 primary_author           |    25
 analyst                  |    14
 graphic_artist           |    13
 lead_agency              |    11
 executive_editor         |    11
 principal_author         |     5
 engineer                 |     2
 manager                  |     1

Roles that held by Orgs without a person:

 role_type_identifier  | num
-----------------------+-----
 author                | 419
 publisher             | 390
 funding_agency        | 347
 distributor           | 244
 host                  | 134
 data_archive          | 107
 contributor           |  91
 contributing_agency   |  56
 data_producer         |  50
 editor                |  14
 lead_agency           |  11
 convening_lead_author |  11
 point_of_contact      |   4
 graphic_artist        |   4
 analyst               |   4
 engineer              |   2
 contributing_author   |   2
 coordinator           |   2
 lead_author           |   1
 manager               |   1
 scientist             |   1

@lomky
Copy link
Collaborator Author

lomky commented Jan 30, 2019

By default, should a object that has optional components get a 'bonus' score on top of the averages it connects to?

  • we think this might be a nice thing to add as a flag to run.
    • flag to say 'increase the score of the parent object if it has optional component x'

@lomky
Copy link
Collaborator Author

lomky commented Jan 30, 2019

For the contributor object itself, it only serves to point through to Person and/or Org, and combine those scores, but has no inherent score.

@lomky
Copy link
Collaborator Author

lomky commented Jan 30, 2019

General Thoughts on Publication type:

  • publication
    • optional
      • activity (rare on most)
        • no affect on parent score
      • regions
        • no affect on parent score
      • contributor
        • cannot say at a general publication level
    • required
      • GCMD keywords
        • affects the parent.
        • weighted down to be less impactful than, say, a figure missing an image

@lomky
Copy link
Collaborator Author

lomky commented Jan 30, 2019

  • article:
    • required:
      • journal
        • 0, high weight
      • contributor: author
        • 0, high weight
    • optional:
      • contributor: point of contact
        • no effect, not all have it
      • activity (rare)
        • no effect
  • book:
    • require:
      • contributor: Publisher
        • 0, high weight
    • optional:
      • contributor: Author
        • no affect on parent score
      • contributor: Editor
        • no affect on parent score
      • activity (rare)
        • no affect on parent score

@rasherman
Copy link

rasherman commented Feb 6, 2019

  • chapter:

    • require:
      • files
      • report (might have to be linked for score the opposite direction)
      • references
      • keywords
    • optional:
      • findings
      • figures
      • tables
      • contributors
      • activity
      • regions
  • dataset

    • required:
      • contributor
      • keywords
    • optional:
      • lexicon
      • regions
      • activity
  • figure

    • required:
      • image
      • file
      • contributor (Point of Contact)
      • Report (or indicator)
      • keywords
    • optional:
      • references
      • region
      • activity
      • chapter

@rasherman
Copy link

  • finding
    • required:
      • report
      • references
      • keywords
    • optional:
      • figure
      • chapter
      • contributor
      • region
      • activity
  • image
    • required:
      • figure (?)
      • file
      • activity
    • optional:
      • contributor
      • keywords
      • regions
  • journal
    • required:
      • contributor (Publisher)
    • optional:
      • keywords
      • regions
      • contributor (other than Publisher)

@lomky
Copy link
Collaborator Author

lomky commented Feb 7, 2019

Thank you all for moving forward! Could you explain the details on a few for me?

  • dataset

    • required:
      • contributor
        • which contributor type?
      • keywords
        • I don't understand this one. Are we giving datasets keywords? Should all datasets always have applicable gcmd keywords?
  • figure

    • optional:
      • references
        • are these really optional? Shouldn't every figure, in the best case, have its references?
  • finding

    • optional:
      • figure
        • I don't understand why a finding could have a figure?

@rasherman
Copy link

  1. Dataset contributor: every dataset should have some person and/or organization that produced it, but I don't think we can restrict what type they would be. In some cases it would be a publisher, in some an author...
  2. Dataset keywords: we are not assigning keywords to datasets at this point, but they definitely COULD all have them (keywords were originally designed to be used to describe datasets). It should be a very low weight, but it's always possible in the future that we could have some ingest of keywords assigned to NASA or NOAA dataset catalogs and we should already have it baked in that doing something like that would improve a bunch of scores, because it is a positive change to our system.
  3. Figure references: We were waffling on this point. Probably every figure should have a reference, but I can definitely imagine cases where they wouldn't and it would be fine. For instance, a photograph of something might have an activity or some other sourcing of who took the photograph, but it might not have a citation to a previous publication.
  4. Finding figures: at this point no findings have ever had figures, but isn't it possible that one could? What if in the next report we assign the figure explaining likelihoods to findings, or something else like that?

@lomky
Copy link
Collaborator Author

lomky commented Feb 7, 2019

Thanks! I agree wholeheartedly with 1, 2, and 3. For Finding figures, I can see that as possible, but I don't think I'd worry about future proofing Findings in that way. The same could be said for Finding tables or Finding datasets. So if the situation ever came up, I'd want to update our rating then. Thoughts?

@R-Aniekwu
Copy link

R-Aniekwu commented Feb 7, 2019 via email

@lomky
Copy link
Collaborator Author

lomky commented Feb 13, 2019

  • Reports

    • Required
      • File
      • Contributor (Publisher or Distributor)
      • Contributor (Author or Editor)
    • Optional
      • gcmd_keywords
      • figure
      • table
      • finding
      • chapter
      • activity
      • contributors
      • regions
      • references
  • Scenario

    • Required
      • Contributor
    • Optional
      • file
  • Table

    • Required
      • array
      • report
      • keywords
    • Optional
      • Contributor
      • chapter
      • references
      • regions
      • activity
  • Array

    • required
      • activity
  • Webpage

    • Required
      • contributor Host
    • Optional
      • Contributor
      • activity

@lomky lomky removed their assignment Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants