Skip to content
This repository has been archived by the owner on Nov 18, 2020. It is now read-only.

Additional or Quality Metadata for Google Scholar (aka, The Scholar) #1602

Open
mtribone opened this issue Jul 22, 2019 · 5 comments
Open

Additional or Quality Metadata for Google Scholar (aka, The Scholar) #1602

mtribone opened this issue Jul 22, 2019 · 5 comments

Comments

@mtribone
Copy link
Contributor

mtribone commented Jul 22, 2019

We need to rework the metadata per the recommendations from Google in regards to scholarly literature.

  • https://scholar.google.com/intl/en/scholar/inclusion.html#indexing
  • Place each article and each abstract in a separate HTML or PDF file
  • Use Dublin Core tags (e.g., DC.title) as a last resort
  • Only scholarly papers are appropriate for inclusion in Google Scholar; each paper needs to be listed on a separate URL; and at least the full author-written abstract must be clearly visible on the URL that you wish to be included in Google Scholar search results.

Example Work that we would want to be indexed
https://scholarsphere.psu.edu/concern/generic_works/xwd375x69k

Datasets should be found in Google Dataset Search.
https://developers.google.com/search/docs/data-types/dataset

@mtribone mtribone added this to the ScholarSphere 3.9 milestone Jul 22, 2019
@mtribone mtribone changed the title Additional or Quality Metadata for Google ScholarSphere Additional or Quality Metadata for Google Scholar (aka, The Scholar) Jul 22, 2019
@DanCoughlin
Copy link
Contributor

There are a couple of things that would be very helpful for indexing if possible:

  • Adding the publication date to the citation_publication_date tags, so they aren't blank, e.g.
    A common repository indexing error is using the upload/online date; only the formal publication date should be entered in this field.
  • Adding only the title of the item, and not the file name, to the citation_title, e.g.
    view-source:https://scholarsphere.psu.edu/files/rf55z7756
looks great.

A file with .csv at the end, which would be a red flag for the indexing system. It would not be considered a publication

It's better not to add empty metatags. If you don't have the publication date of an item, for example, it's best not to include the citation_publication_date tag.

Additionally, the journal specific information like vol/issue and page numbers can be left off for repository items. The really crucial items for ScholarSphere would be:

  • citation_title
  • citation_author
  • citation_publication_date
  • citation_pdf_url

@mtribone
Copy link
Contributor Author

mtribone commented Aug 1, 2019

Placeholder for questions to help solve the indexing issues

  1. Does each article require the abstract to be in a separate HTML or PDF file?
  2. What file formats are excepted? Seems like preservation formats are not the required file formats?

@mtribone
Copy link
Contributor Author

@mtribone
Copy link
Contributor Author

Perhaps we should require the publication date if a work has the resource type of article, book, journal, part of book, research paper? This would require a change to the new/edit work form to pull the publication date out of the additional metadata. Or we could remove the citation_publication_date from the meta for Google Scholar if it is blank. Might be better to get a date.

@mtribone
Copy link
Contributor Author

mtribone commented Aug 12, 2019

We will also need to redo Batch Create because the process uses the filenames as the title of the work. It does not remove the file extension from the title.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants