-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #59 from RobLBaker/master
addresses bugs in loading core metadata for power BI
- Loading branch information
Showing
9 changed files
with
74 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ | |
#' | ||
#' #' @details The returned dataframe has three columns, EML_element, EML_data and EML_data2. EML_element describes the EML element that was extracted. EML_data and EML_data2 contain the data from that element. In the case of EML_elements with only one piece of data (e.g. the data package title), the data is repeated in the EML_data and EML_data2 columns. In cases where the element contains two related pieces of data (e.g. author), those items are held in EML_data (e.g. the author's name) and EML_data2 (e.g. the author's email address). | ||
#' | ||
#' Currently this function is under development and may have issues if an author has more than two givenNames (it will only use the first givenName), an author has not givenNames (only a surName) or an author is an organization and does not have any individualName. If you have a data package with these issues, please contact [[email protected]](mailto:[email protected]). | ||
#' Currently this function is under development and may have issues if an author is an organization. If you have a data package with these issues, please contact [[email protected]](mailto:[email protected]). | ||
#' | ||
#' The fields that should be returned in the dataframe include: title, publication date, authors (and emails), contacts (and emails), publisher, DOI, publisher city, publisher state, content begin date, content end date, the abstract, notes, "for or by NPS", the license name (e.g. "Public Domain", "CC0"), and a list of each data file in the data package by name. | ||
#' | ||
|
@@ -126,7 +126,7 @@ load_core_metadata <- function(ds_ref, path = paste0(getwd(), "/data")){ | |
#' | ||
#' @description `.get_authors()` extracts the "creators" element from EML metadata and returns it as a dataframe with three columsn, first a column indicating that each row is an author. Second, and column with the author's name (first last). Third, the author's email address. | ||
#' | ||
#' @details There are some known issues with this function; unfortunately at this time we do not have example data packages to test them. These include: authors without a givenName, authors with more than two givenNames (e.g. multiple middle names), organizations as authors where there is no individualName. | ||
#' @details There are some known issues with this function; unfortunately at this time we do not have example data packages to test them. These include: authors without a givenName and organizations as authors where there is no individualName. | ||
#' | ||
#' @param metadata an EML formatted R object | ||
#' | ||
|
@@ -144,29 +144,40 @@ load_core_metadata <- function(ds_ref, path = paste0(getwd(), "/data")){ | |
#set up empty dataframe to hold creator info: | ||
individual <- data.frame(author = as.character(), | ||
contact = as.character()) | ||
for(i in 1:length(seq_along(creators))){ | ||
|
||
#if single creator, nest it so that it behaves the same as when there are | ||
#multiple creators: | ||
if ("organizationName" %in% names(creators) | | ||
"individualName" %in% names(creators)) { | ||
creators <- list(creators) | ||
} | ||
|
||
for (i in 1:length(seq_along(creators))) { | ||
creator <- unlist(creators[[i]], recursive = FALSE) | ||
#if there is an individual name: | ||
if(!is.null(creator$individualName.surName)){ | ||
if (!is.null(creator$ | ||
individualName.surName)) { | ||
#if there is a given name: | ||
if(!is.null(creator$individualName.givenName)){ | ||
#if there are two given names (e.g. first and middle) | ||
if(length(seq_along(creator$individualName.givenName)) == 2){ | ||
given <- paste(creator$individualName.givenName[[1]], | ||
creator$individualName.givenName[[2]], | ||
sep = " ") | ||
#if there is only one given name (first) | ||
} else if(length(seq_along(creator$individualNAme.givenName)) == 1){ | ||
given <- creator$individualName.givenName | ||
} else { | ||
#More than 2 given names (e.g. first, middle, middle), use only the first given name: | ||
given <- creator$individualName.givenName[[1]] | ||
if (!is.null(creator$individualName.givenName)) { | ||
given <- NULL | ||
for (i in 1:length(seq_along(creator$individualName.givenName))) { | ||
if (nchar(creator$individualName.givenName[[i]]) == 1) { | ||
given <- paste0(given, | ||
paste0(creator$individualName.givenName[[i]], | ||
". ")) | ||
} else { | ||
given <- paste0(given, | ||
paste0(creator$individualName.given[[i]], | ||
" ")) | ||
} | ||
} | ||
|
||
} else { | ||
#if there is no given name: | ||
given <- NA | ||
} | ||
#get rid of extra whitespaces and trailing whitespaces: | ||
given <- stringr::str_squish(given) | ||
|
||
#get last name | ||
sur <- creator$individualName.surName | ||
#generate full name as first (first) last | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.