From 53dc79cbba7554914a3e92132849177e8efc9afd Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Fri, 19 Feb 2021 22:21:08 -0500 Subject: [PATCH 01/16] "Rmd2" are the ones that appear to be the real Rmd, keep those and dele the R Rmd --- vignettes/A-introduction.Rmd | 9 +- vignettes/A-introduction.Rmd2 | 118 -------------------- vignettes/B-search.Rmd | 94 ++-------------- vignettes/B-search.Rmd2 | 47 -------- vignettes/C-retrieval.Rmd | 195 ++++------------------------------ vignettes/C-retrieval.Rmd2 | 111 ------------------- vignettes/D-archiving.Rmd | 5 +- vignettes/D-archiving.Rmd2 | 81 -------------- 8 files changed, 33 insertions(+), 627 deletions(-) delete mode 100644 vignettes/A-introduction.Rmd2 delete mode 100644 vignettes/B-search.Rmd2 delete mode 100644 vignettes/C-retrieval.Rmd2 delete mode 100644 vignettes/D-archiving.Rmd2 diff --git a/vignettes/A-introduction.Rmd b/vignettes/A-introduction.Rmd index e6c6bab..63f630d 100644 --- a/vignettes/A-introduction.Rmd +++ b/vignettes/A-introduction.Rmd @@ -1,6 +1,6 @@ --- title: "Introduction to Dataverse" -date: "2017-06-13" +date: "`r Sys.Date()`" output: html_document: fig_caption: false @@ -41,8 +41,7 @@ library("dataverse") Dataverse has some terminology that is worth quickly reviewing before showing how to work with Dataverse in R. Dataverse is an application that can be installed in many places. As a result, **dataverse** can work with any installation but you need to specify which installation you want to work with. This can be set by default with an environment variable, `DATAVERSE_SERVER`: - -```r +```{r} library("dataverse") Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") ``` @@ -76,7 +75,7 @@ With that set, you can easily create a new dataverse, create a dataset within th ```R # create a dataverse -dat <- create_dataverse("mydataverse") +dat <- create_daverse("mydataverse") # create a list of metadata metadat <- list(title = "My Study", @@ -100,7 +99,7 @@ Your data are now publicly accessible. ## Appendix: dvn to dataverse Crosswalk -The original Dataverse client for R was called [dvn](https://CRAN.R-project.org/package=dvn); it worked with Dataverse versions <= 3 and was removed from CRAN in favor of [dataverse](https://CRAN.R-project.org/package=dataverse) in 2018. dvn provided functionality for searching, retrieving, and depositing data. Here is a cross-walk of functionality in case you were already familiar with the dvn package: +The original [dvn](https://cran.r-project.org/?package=dvn) package, which worked with Dataverse versions <= 3, provided functionality for searching, retrieving, and depositing data. Here is a cross-walk of functionality in case you were already familiar with the dvn package: | API Category | **dataverse** functions | **dvn** functions | | ------------ | ----------------------- | ----------------- | diff --git a/vignettes/A-introduction.Rmd2 b/vignettes/A-introduction.Rmd2 deleted file mode 100644 index 63f630d..0000000 --- a/vignettes/A-introduction.Rmd2 +++ /dev/null @@ -1,118 +0,0 @@ ---- -title: "Introduction to Dataverse" -date: "`r Sys.Date()`" -output: - html_document: - fig_caption: false - toc: true - toc_float: - collapsed: false - smooth_scroll: false - toc_depth: 2 -vignette: > - %\VignetteIndexEntry{1. Introduction to Dataverse} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -The **dataverse** package is the official R client for [Dataverse 4](https://dataverse.org/) data repositories. The package enables data search, retrieval, and deposit with any Dataverse installation, thus allowing R users to integrate public data sharing into the reproducible research workflow. - -In addition to this introduction, the package contains three additional vignettes covering: - - * ["Data Search and Discovery"](B-search.html) - * ["Data Retrieval and Reuse"](C-retrieval.html) - * ["Data Archiving"](D-archiving.html) - -They can be accessed from [CRAN](https://cran.r-project.org/package=dataverse) or from within R using `vignettes(package = "dataverse")`. - -The dataverse client package can be installed from [CRAN](https://cran.r-project.org/package=dataverse), and you can find the latest development version and report any issues on GitHub: - -```R -if (!require("remotes")) { - install.packages("remotes") -} -remotes::install_github("iqss/dataverse-client-r") -library("dataverse") -``` - -(Note: **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. See the appendix of this vignette for a cross-walk of functionality between **dvn** and **dataverse**.) - -## Quick Start - -Dataverse has some terminology that is worth quickly reviewing before showing how to work with Dataverse in R. Dataverse is an application that can be installed in many places. As a result, **dataverse** can work with any installation but you need to specify which installation you want to work with. This can be set by default with an environment variable, `DATAVERSE_SERVER`: - -```{r} -library("dataverse") -Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") -``` - -This should be the Dataverse server, without the "https" prefix or the "/api" URL path, etc. The package attempts to compensate for any malformed values, though. - -Within a given Dataverse installation, organizations or individuals can create objects that are also called "Dataverses". These Dataverses can then contain other *dataverses*, which can contain other *dataverses*, and so on. They can also contain *datasets* which in turn contain files. You can think of Harvard's Dataverse as a top-level installation, where an institution might have a *dataverse* that contains a subsidiary *dataverse* for each researcher at the organization, who in turn publishes all files relevant to a given study as a *dataset*. - -You can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. For example, to search for data files or datasets that mention "ecological inference", we can just do: - -```R -dataverse_search("ecological inference")[c("name", "type", "description")] -``` - -The [search vignette](B-search.html) describes this functionality in more detail. To retrieve a data file, we need to investigate the dataset being returned and look at what files it contains using a variety of functions, the last of which - `get_file()` - can retrieve the files as raw vectors: - -```R -get_dataset() -dataset_files() -get_file_metadata() -get_file() -``` - -For "native" Dataverse features (such as user account controls) or to create and publish a dataset, you will need an API key linked to a Dataverse installation account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](https://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using: - -```R -Sys.setenv("DATAVERSE_KEY" = "examplekey12345") -``` - -With that set, you can easily create a new dataverse, create a dataset within that dataverse, push files to the dataset, and release it: - -```R -# create a dataverse -dat <- create_daverse("mydataverse") - -# create a list of metadata -metadat <- list(title = "My Study", - creator = "Doe, John", - description = "An example study") - -# create the dataset -dat <- initiate_dataset("mydataverse", body = metadat) - -# add files to dataset -tmp <- tempfile() -write.csv(iris, file = tmp) -f <- add_file(dat, file = tmp) - -# publish new dataset -publish_dataset(dat) -``` - -Your data are now publicly accessible. - - -## Appendix: dvn to dataverse Crosswalk - -The original [dvn](https://cran.r-project.org/?package=dvn) package, which worked with Dataverse versions <= 3, provided functionality for searching, retrieving, and depositing data. Here is a cross-walk of functionality in case you were already familiar with the dvn package: - -| API Category | **dataverse** functions | **dvn** functions | -| ------------ | ----------------------- | ----------------- | -| Data Search | `dataverse_search()` | `dvSearch()` | -| Data Retrieval | `get_file_metadata()` | `dvMetadata()` | -|| `get_file()` | | -| Data Deposit | `create_dataverse()` | | -|| `initiate_dataset()` | `dvCreateStudy()` | -|| `update_dataset()` | `dvEditStudy()` | -|| `add_file()` | `addFile()` | -|| `delete_file()` | `dvDeleteFile()` | -|| `publish_sword_dataset()` | `dvReleaseStudy()` | -|| `delete_sword_dataset()` | | -|| `service_document()` | `dvServiceDoc()` | -|| `dataset_statement()` | `dvStudyStatement()` | -|| `list_datasets()` | `dvUserStudies()` | diff --git a/vignettes/B-search.Rmd b/vignettes/B-search.Rmd index 1c3b8f7..bdfbfb7 100644 --- a/vignettes/B-search.Rmd +++ b/vignettes/B-search.Rmd @@ -1,6 +1,6 @@ --- title: "Data Search and Discovery" -date: "2017-06-15" +date: "`r Sys.Date()`" output: html_document: fig_caption: false @@ -15,111 +15,33 @@ vignette: > %\VignetteEncoding{UTF-8} --- - +```{r knitr_options, echo=FALSE, results="hide"} +options(width = 120) +knitr::opts_chunk$set(results = "hold") +``` Searching for data within Dataverse is quite easy using the `dataverse_search()` function. The simplest searches simply consist of a query string: - -```r +```{r} library("dataverse") Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") dataverse_search("Gary King")[c("name")] ``` -``` -## 10 of 1043 results retrieved -``` - -``` -## name -## 1 00698McArthur-King-BoxCoverSheets.pdf -## 2 00698McArthur-King-MemoOfAgreement.pdf -## 3 00698McArthur-King-StudyDescription.pdf -## 4 077_mod1_s2m.tab -## 5 10 Million International Dyadic Events -## 6 1998 Jewish Community Study of the Coachella Valley, California -## 7 2002 State Legislative Survey -## 8 A Comparative Study between Gurukul System and Western System of Education -## 9 A Demographic and Attitudinal Study of the Jewish Community of St. Louis -## 10 A Demographic Study of the Jewish Community of Atlantic County, 1985 -``` - The results are paginated, so users can rely upon the `per_page` and `start` argument to requested subsequent pages of results. We'll start at 6 and to show that we retrieve the last five results from the previous query plus 15 more (due to `per_page = 20`): - -```r +```{r} dataverse_search("Gary King", start = 6, per_page = 20)[c("name")] ``` -``` -## 20 of 1043 results retrieved -``` - -``` -## name -## 1 2002 State Legislative Survey -## 2 A Comparative Study between Gurukul System and Western System of Education -## 3 A Demographic and Attitudinal Study of the Jewish Community of St. Louis -## 4 A Demographic Study of the Jewish Community of Atlantic County, 1985 -## 5 A Demographic Study of the Jewish Community of Greater Kansas City -## 6 A Demographic Study of the Jewish Community of Greater Washington, 1983 -## 7 A Lexicial Index of Electoral Democracy -## 8 A Population Study of the Jewish Community of Metrowest, New Jersey -## 9 A Population Study of the Jewish Community of Rochester, 1986 -## 10 A Population Study of the Jewish Community of Worcester -## 11 A Study of Jewish Culture in the Bay Area -## 12 A Unified Model of Cabinet Dissolution in Parliamentary Democracies -## 13 ABC News / The Washington Post Poll: January, 1988 -## 14 ABC News / The Washington Post poll # 7925: Social Security/1984 Election -## 15 ABC News / The Washington Post Poll: December, 1987 -## 16 ABC News Gary Hart Poll, December 1987 -## 17 ABC News Gary Hart Poll, December 1987 -## 18 ABC News Iraq Poll, August 1990 -## 19 ABC News Kosovo Peace Poll #1, June 1999 -## 20 ABC News New Hampshire Primary Voter Poll, January 2000 -``` - More complicated searches can specify metadata fields like `title` and restrict results to a specific `type` of Dataverse object (a "dataverse", "dataset", or "file"): - -```r +```{r} ei <- dataverse_search(author = "Gary King", title = "Ecological Inference", type = "dataset", per_page = 20) -``` - -``` -## 20 of 867 results retrieved -``` - -```r # fields returned names(ei) # names of datasets ei$name ``` -``` -## [1] "name" "type" "url" "global_id" "description" "published_at" "citationHtml" -## [8] "citation" "authors" -## [1] "10 Million International Dyadic Events" -## [2] "3D Dust map from Green et al. (2015)" -## [3] "[KRISNA02]³ New Religious Movements : Case of ISKCON" -## [4] "A Comparative Study between Gurukul System and Western System of Education" -## [5] "A Lexicial Index of Electoral Democracy" -## [6] "A Statistical Inference Engine for Small, Dependent Samples [Version 2.310]" -## [7] "A Unified Model of Cabinet Dissolution in Parliamentary Democracies" -## [8] "ABC News / The Washington Post poll # 7925: Social Security/1984 Election" -## [9] "ABC News Iraq Poll, August 1990" -## [10] "ABC News/The Washington Post Poll: Los Angeles Race Riots" -## [11] "ABC News/The Washington Post Poll: Race Relations" -## [12] "ABC News/Washington Post Los Angeles Beating Poll, April 1992" -## [13] "ABC News/Washington Post Poll #1, September 1990" -## [14] "ABC News/Washington Post Race Relations Poll, May 1992" -## [15] "ABC News/Washington Post Reagan 100 Days Poll, April 1981" -## [16] "Afrobarometer Round 3: The Quality of Democracy and Governance in 18 African Countries, 2005-2006" -## [17] "Afrobarometer Round 3: The Quality of Democracy and Governance in Benin, 2005" -## [18] "Afrobarometer Round 3: The Quality of Democracy and Governance in Botswana, 2005" -## [19] "Afrobarometer Round 3: The Quality of Democracy and Governance in Cape Verde, 2005" -## [20] "Afrobarometer Round 3: The Quality of Democracy and Governance in Ghana, 2005" -``` - Once datasets and files are identified, it is easy to download and use them directly in R. See the ["Data Retrieval" vignette](C-retrieval.html) for details. diff --git a/vignettes/B-search.Rmd2 b/vignettes/B-search.Rmd2 deleted file mode 100644 index bdfbfb7..0000000 --- a/vignettes/B-search.Rmd2 +++ /dev/null @@ -1,47 +0,0 @@ ---- -title: "Data Search and Discovery" -date: "`r Sys.Date()`" -output: - html_document: - fig_caption: false - toc: true - toc_float: - collapsed: false - smooth_scroll: false - toc_depth: 2 -vignette: > - %\VignetteIndexEntry{2. Data Search and Discovery} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r knitr_options, echo=FALSE, results="hide"} -options(width = 120) -knitr::opts_chunk$set(results = "hold") -``` - -Searching for data within Dataverse is quite easy using the `dataverse_search()` function. The simplest searches simply consist of a query string: - -```{r} -library("dataverse") -Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") -dataverse_search("Gary King")[c("name")] -``` - -The results are paginated, so users can rely upon the `per_page` and `start` argument to requested subsequent pages of results. We'll start at 6 and to show that we retrieve the last five results from the previous query plus 15 more (due to `per_page = 20`): - -```{r} -dataverse_search("Gary King", start = 6, per_page = 20)[c("name")] -``` - -More complicated searches can specify metadata fields like `title` and restrict results to a specific `type` of Dataverse object (a "dataverse", "dataset", or "file"): - -```{r} -ei <- dataverse_search(author = "Gary King", title = "Ecological Inference", type = "dataset", per_page = 20) -# fields returned -names(ei) -# names of datasets -ei$name -``` - -Once datasets and files are identified, it is easy to download and use them directly in R. See the ["Data Retrieval" vignette](C-retrieval.html) for details. diff --git a/vignettes/C-retrieval.Rmd b/vignettes/C-retrieval.Rmd index ffe6db9..493b2cd 100644 --- a/vignettes/C-retrieval.Rmd +++ b/vignettes/C-retrieval.Rmd @@ -1,6 +1,6 @@ --- title: "Data Retrieval and Reuse" -date: "2017-06-15" +date: "`r Sys.Date()`" output: html_document: fig_caption: false @@ -15,7 +15,10 @@ vignette: > %\VignetteEncoding{UTF-8} --- - +```{r knitr_options, echo=FALSE, results="hide"} +options(width = 120) +knitr::opts_chunk$set(results = "hold") +``` This vignette shows how to download data from Dataverse using the dataverse package. We'll focus on a Dataverse repository that contains supplemental files for [*Political Analysis Using R*](https://www.springer.com/gb/book/9783319234458), which is stored at Harvard University's [IQSS Dataverse Network](https://dataverse.harvard.edu/): @@ -29,54 +32,12 @@ If you don't already know what datasets and files you want to use from Dataverse We will download these files and examine them directly in R using the **dataverse** package. To begin, we need to loading the package and using the `get_dataset()` function to retrieve some basic metadata about the dataset: - -```r +```{r} library("dataverse") Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") (dataset <- get_dataset("doi:10.7910/DVN/ARKOTI")) ``` -``` -## Dataset (75170): -## Version: 1.0, RELEASED -## Release Date: 2015-07-07T02:57:02Z -## License: CC0 -## 17 Files: -## label version id contentType -## 1 alpl2013.tab 2 2692294 text/tab-separated-values -## 2 BPchap7.tab 2 2692295 text/tab-separated-values -## 3 chapter01.R 2 2692202 text/plain; charset=US-ASCII -## 4 chapter02.R 2 2692206 text/plain; charset=US-ASCII -## 5 chapter03.R 2 2692210 text/plain; charset=US-ASCII -## 6 chapter04.R 2 2692204 text/plain; charset=US-ASCII -## 7 chapter05.R 2 2692205 text/plain; charset=US-ASCII -## 8 chapter06.R 2 2692212 text/plain; charset=US-ASCII -## 9 chapter07.R 2 2692209 text/plain; charset=US-ASCII -## 10 chapter08.R 2 2692208 text/plain; charset=US-ASCII -## 11 chapter09.R 2 2692211 text/plain; charset=US-ASCII -## 12 chapter10.R 1 2692203 text/plain; charset=US-ASCII -## 13 chapter11.R 1 2692207 text/plain; charset=US-ASCII -## 14 comprehensiveJapanEnergy.tab 2 2692296 text/tab-separated-values -## 15 constructionData.tab 2 2692293 text/tab-separated-values -## 16 drugCoverage.csv 1 2692233 text/plain; charset=US-ASCII -## 17 hanmerKalkanANES.tab 2 2692290 text/tab-separated-values -## 18 hmnrghts.tab 2 2692298 text/tab-separated-values -## 19 hmnrghts.txt 1 2692238 text/plain -## 20 levant.tab 2 2692289 text/tab-separated-values -## 21 LL.csv 1 2692228 text/plain; charset=US-ASCII -## 22 moneyDem.tab 2 2692292 text/tab-separated-values -## 23 owsiakJOP2013.tab 2 2692297 text/tab-separated-values -## 24 PESenergy.csv 1 2692230 text/plain; charset=US-ASCII -## 25 pts1994.csv 1 2692229 text/plain; charset=US-ASCII -## 26 pts1995.csv 1 2692231 text/plain; charset=US-ASCII -## 27 sen113kh.ord 1 2692239 text/plain; charset=US-ASCII -## 28 SinghEJPR.tab 2 2692299 text/tab-separated-values -## 29 SinghJTP.tab 2 2692288 text/tab-separated-values -## 30 stdSingh.tab 2 2692291 text/tab-separated-values -## 31 UN.csv 1 2692232 text/plain; charset=US-ASCII -## 32 war1800.tab 2 2692300 text/tab-separated-values -``` - The output prints some basic metadata and then the `str()` of the `files` data frame returned by the call. This lists all of the files in the dataset along with a considerable amount of metadata about each. We can see a quick glance at these files using: ``` @@ -88,33 +49,24 @@ This shows that there are indeed 32 files, a mix of .R code files and tab- and c You can also retrieve more extensive metadata using `dataset_metadata()`: - -```r +```{r} str(dataset_metadata("doi:10.7910/DVN/ARKOTI"), 1) ``` -``` -## List of 2 -## $ displayName: chr "Citation Metadata" -## $ fields :'data.frame': 7 obs. of 4 variables: -``` - We'll focus here on the code and data files for Chapter 2 from the book. ## Retrieving Files Let's start by grabbing the code using `get_file()` (note that this always returns a raw vector): - -```r +```{r} code3 <- get_file("chapter03.R", "doi:10.7910/DVN/ARKOTI") writeBin(code3, "chapter03.R") ``` Now we'll get the corresponding data and save it locally. For this code we need two data files: - -```r +```{r} writeBin(get_file("constructionData.tab", "doi:10.7910/DVN/ARKOTI"), "constructionData.dta") writeBin(get_file("PESenergy.csv", "doi:10.7910/DVN/ARKOTI"), @@ -123,146 +75,37 @@ writeBin(get_file("PESenergy.csv", "doi:10.7910/DVN/ARKOTI"), To confirm that the data look the way we want, we can also (perhaps alternatively) load it directly into R: - -```r +```{r} constructionData <- foreign::read.dta("constructionData.dta") str(constructionData) PESenergy <- utils::read.table("PESenergy.csv") str(PESenergy) ``` -``` -## 'data.frame': 50 obs. of 55 variables: -## $ year : int 1997 1997 1997 1997 1997 1997 1997 1997 1997 1997 ... -## $ stno : int 1 2 3 4 5 6 7 8 9 10 ... -## $ totalreg : int 329 500 314 963 2106 643 634 239 1996 880 ... -## $ totalhealth : int 300 424 263 834 1859 554 501 204 1640 732 ... -## $ raneyfolded97 : num 0.58 0.69 0.85 0.63 0.5 ... -## $ healthagenda97 : int 49 180 137 220 1409 153 324 40 408 157 ... -## $ predictedtotalig : num 51.8 99 81.8 111.2 224.1 ... -## $ supplytotalhealth : int 1168 6991 4666 9194 70014 8847 7845 1438 35363 13471 ... -## $ totalhealthsupplysq : int 136 4887 2177 8453 490196 7827 6154 207 125054 18147 ... -## $ partratetotalhealth : num 2.48 1.09 1.09 1.4 0.35 ... -## $ ighealthcare : int 29 76 51 129 247 89 133 35 356 148 ... -## $ supplydirectpatientcare : int 1137 6687 4458 8785 66960 8320 7439 1365 33793 12760 ... -## $ dpcsupplysq : int 129 4472 1987 7718 448364 6922 5534 186 114197 16282 ... -## $ partratedpc : num 1.14 0.51 0.43 0.68 0.17 ... -## $ igdpcare : int 13 34 19 60 112 40 67 12 212 74 ... -## $ supplypharmprod : int 0 174 78 229 2288 340 202 36 962 360 ... -## $ pharmsupplysq : int 0 30276 6084 52441 5234944 115600 40804 1296 925444 129600 ... -## $ partratepharmprod : num 0 10.34 19.23 5.24 2.05 ... -## $ igpharmprod : int 4 18 15 12 47 23 22 12 46 32 ... -## $ supplybusiness : int 0 51 28 93 315 55 36 14 317 78 ... -## $ businesssupplysq : int 0 2601 784 8649 99225 3025 1296 196 100489 6084 ... -## $ partratebusness : num 0 1.96 14.29 15.05 6.03 ... -## $ igbusiness : int 2 1 4 14 19 5 4 2 25 6 ... -## $ supplygovt : int 14 26 80 23 70 71 105 2 67 176 ... -## $ govsupplysq : num 0.02 0.07 0.64 0.05 0.49 ... -## $ partrategov : num 0 38.5 2.5 30.4 10 ... -## $ iggovt : int 0 10 2 7 7 1 8 0 12 2 ... -## $ supplyadvocacy : int 16 37 14 57 344 54 51 18 206 76 ... -## $ advossq : int 256 1369 196 3249 118336 2916 2601 324 42436 5776 ... -## $ partrateadvo : num 31.25 16.22 28.57 31.58 8.72 ... -## $ ig97advoc : int 5 6 4 18 30 7 9 4 26 17 ... -## $ rnmedschools : int 1 16 8 7 37 7 12 3 18 21 ... -## $ rnmedschoolssq : int 1 256 64 49 1369 49 144 9 324 441 ... -## $ rnmedschoolpartrate : num 100 0 12.5 28.57 5.41 ... -## $ rnmedschooligs : int 1 0 1 2 2 0 1 0 6 1 ... -## $ healthprofessionals : int 12890 128980 82140 122760 749620 111550 121110 22740 471270 215670 ... -## $ healthprofessionalssquared: int 16615 1663584 674698 1507002 56193014 1244340 1466763 51711 22209541 4651355 ... -## $ partrateprofessionals : num 0.03 0.01 0.01 0.01 0 ... -## $ ighealthprofessionals : int 4 7 6 16 30 13 22 5 29 16 ... -## $ predictdpcpartrate : num 1.175 0.915 1.016 0.826 0.348 ... -## $ predictdpcig : num 23.1 49.7 39.4 58.8 103.5 ... -## $ predictprofpartrate : num 0.02475 0.01383 0.01788 0.01434 0.00579 ... -## $ predictprofig : num 7.59 12.58 10.69 12.34 22.47 ... -## $ predictmedschoolparttrate : num 17.39 8.08 12.3 12.95 5.02 ... -## $ predictmedschoolig : num 0.355 1.269 0.774 0.713 2.65 ... -## $ predictadvopartrate : num 31.9 26.4 32.5 21.6 13 ... -## $ predictadvoig : num 5.96 7.98 5.76 9.83 28.53 ... -## $ predictbuspartrate : num 25.78 18.08 21.33 13.1 7.27 ... -## $ predictbusig : num 2.58 7.96 5.66 11.66 20.04 ... -## $ predictpharmpartrate : num 21.38 15.22 18.52 13.44 4.14 ... -## $ predictpharmig : num 11.3 18.1 14.4 20.1 45.1 ... -## $ predictgovpartrate : num 14.41 12.61 5.84 13.03 6.93 ... -## $ predictgovig : num 2.06 2.43 3.78 2.35 3.57 ... -## $ predicttotalpartrate : num 2.41 1.823 2.047 1.623 0.752 ... -## $ predicttotalig : num 54.2 99.2 81.9 114.8 228.3 ... -## - attr(*, "datalabel")= chr "" -## - attr(*, "time.stamp")= chr " 1 Jun 2013 16:59" -## - attr(*, "formats")= chr "%8.0g" "%8.0g" "%8.0g" "%8.0g" ... -## - attr(*, "types")= int 252 251 252 252 254 252 254 253 253 254 ... -## - attr(*, "val.labels")= chr "" "" "" "" ... -## - attr(*, "var.labels")= chr "Year" "StNo." "97 TotalReg" "97Total-Health" ... -## - attr(*, "version")= int 12 -## 'data.frame': 181 obs. of 1 variable: -## $ V1: Factor w/ 181 levels "Apr-69,5,3.4,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,39.2",..: 31 62 47 107 1 122 92 77 16 167 ... -``` - In addition to visual inspection, we can compare the UNF signatures for each dataset against what is reported by Dataverse to confirm that we received the correct files: - -```r +```{r} library("UNF") unf(constructionData) unf(PESenergy) dataset$files[c("label", "UNF")] ``` -``` -## UNF6:+4pc5114xS0ryr1sSvdX6g== -## UNF6:TD7TEMZyrX4iGTlTsUKQDg== -## label UNF -## 1 alpl2013.tab UNF:6:d9ZNXvmiPfiunSAiXRpVfg== -## 2 BPchap7.tab UNF:6:B3/HJbnzktaX5eEJA2ItiA== -## 3 chapter01.R -## 4 chapter02.R -## 5 chapter03.R -## 6 chapter04.R -## 7 chapter05.R -## 8 chapter06.R -## 9 chapter07.R -## 10 chapter08.R -## 11 chapter09.R -## 12 chapter10.R -## 13 chapter11.R -## 14 comprehensiveJapanEnergy.tab UNF:6:Vhb3oZb9m4Nk9N7s6UAHGg== -## 15 constructionData.tab UNF:6:+4pc5114xS0ryr1sSvdX6g== -## 16 drugCoverage.csv -## 17 hanmerKalkanANES.tab UNF:6:lrQrhDAXFc8lSRP9muJslw== -## 18 hmnrghts.tab UNF:6:uEg24jBA2ht0P4WeNLjI+w== -## 19 hmnrghts.txt -## 20 levant.tab UNF:6:zlgG7+JXsIZYvS383eQOvA== -## 21 LL.csv -## 22 moneyDem.tab UNF:6:7M/QM5i6IM/VUM94UJjJUQ== -## 23 owsiakJOP2013.tab UNF:6:0ZEvCFuUQms2zYD57hmwNQ== -## 24 PESenergy.csv -## 25 pts1994.csv -## 26 pts1995.csv -## 27 sen113kh.ord -## 28 SinghEJPR.tab UNF:6:iDGp9dXOl4SiR+rCBWo8Tw== -## 29 SinghJTP.tab UNF:6:lDCyZ7YQF5O++SRsxh2kGA== -## 30 stdSingh.tab UNF:6:A5gwtn5q/ewkTMpcQEQ73w== -## 31 UN.csv -## 32 war1800.tab UNF:6:jJ++mepKcv9JbJOOPLMf2Q== -``` - ## Reusing Files and Reproducing Analysis -To reproduce the analysis, we can simply run the code file either as a `system()` call or directly in R using `source()` (note this particular file begins with an `rm()` call so you may want to run it in a [new environment](https://stat.ethz.ch/R-manual/R-devel/library/base/html/environment.html)): +To reproduce the analysis, we can simply run the code file either as a `system()` call or directly in R using `source()` (note this particular file begins with an `rm()` call so you may want to run it in a clean R session): ```R -# Option 1 system("Rscript chapter03.R") - -# Option 2 -source("chapter03.R", local=new.env()) +source("chapter03.R") ``` -Any well-produced set of analysis reproduction files, like this one, should run without error once the data and code are in-hand. Troubleshooting analysis files is beyond the scope of this vignette, but common sources are +Any well-produced set of analysis reproduction files, like this one, will easily run without error once the data and code are in-hand. -1. The working directory is not set the same as the author intended. This could affect code files not finding the relative position of datasets or of other code files. -1. Your local machine hasn't downloaded or installed all the necessary datasets and packages. -1. The functions called in the code have changed since the script was developed. +```{r, echo = FALSE, results = "hide"} +unlink("constructionData.dta") +unlink("PESenergy.csv") +unlink("chapter03.R") +``` To archive your own reproducible analyses using Dataverse, see the ["Archiving Data" vignette](D-archiving.html). diff --git a/vignettes/C-retrieval.Rmd2 b/vignettes/C-retrieval.Rmd2 deleted file mode 100644 index 493b2cd..0000000 --- a/vignettes/C-retrieval.Rmd2 +++ /dev/null @@ -1,111 +0,0 @@ ---- -title: "Data Retrieval and Reuse" -date: "`r Sys.Date()`" -output: - html_document: - fig_caption: false - toc: true - toc_float: - collapsed: false - smooth_scroll: false - toc_depth: 2 -vignette: > - %\VignetteIndexEntry{3. Data Retrieval and Reuse} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r knitr_options, echo=FALSE, results="hide"} -options(width = 120) -knitr::opts_chunk$set(results = "hold") -``` - -This vignette shows how to download data from Dataverse using the dataverse package. We'll focus on a Dataverse repository that contains supplemental files for [*Political Analysis Using R*](https://www.springer.com/gb/book/9783319234458), which is stored at Harvard University's [IQSS Dataverse Network](https://dataverse.harvard.edu/): - -> Monogan, Jamie, 2015, "Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems", [doi:10.7910/DVN/ARKOTI](https://doi.org/10.7910/DVN/ARKOTI), Harvard Dataverse, V1, UNF:6:+itU9hcUJ8I9E0Kqv8HWHg== - -This study is persistently retrievable by a "[Digital Object Identifier (DOI)](https://www.doi.org/)": https://doi.org/10.7910/DVN/ARKOTI and the citation above (taken from the Dataverse page) includes a "[Universal Numeric Fingerprint (UNF)](https://guides.dataverse.org/en/latest/developers/unf/index.html)": `UNF:6:+itU9hcUJ8I9E0Kqv8HWHg==`, which provides a versioned, multi-file hash for the entire study, which contains 32 files. - -If you don't already know what datasets and files you want to use from Dataverse, see the ["Data Search" vignette](B-search.html) for guidance on data search and discovery. - -## Retrieving Dataset and File Metadata - -We will download these files and examine them directly in R using the **dataverse** package. To begin, we need to loading the package and using the `get_dataset()` function to retrieve some basic metadata about the dataset: - -```{r} -library("dataverse") -Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") -(dataset <- get_dataset("doi:10.7910/DVN/ARKOTI")) -``` - -The output prints some basic metadata and then the `str()` of the `files` data frame returned by the call. This lists all of the files in the dataset along with a considerable amount of metadata about each. We can see a quick glance at these files using: - -``` -dataset$files[c("filename", "contentType")] -``` - -This shows that there are indeed 32 files, a mix of .R code files and tab- and comma-separated data files. - - -You can also retrieve more extensive metadata using `dataset_metadata()`: - -```{r} -str(dataset_metadata("doi:10.7910/DVN/ARKOTI"), 1) -``` - -We'll focus here on the code and data files for Chapter 2 from the book. - -## Retrieving Files - -Let's start by grabbing the code using `get_file()` (note that this always returns a raw vector): - -```{r} -code3 <- get_file("chapter03.R", "doi:10.7910/DVN/ARKOTI") -writeBin(code3, "chapter03.R") -``` - -Now we'll get the corresponding data and save it locally. For this code we need two data files: - -```{r} -writeBin(get_file("constructionData.tab", "doi:10.7910/DVN/ARKOTI"), - "constructionData.dta") -writeBin(get_file("PESenergy.csv", "doi:10.7910/DVN/ARKOTI"), - "PESenergy.csv") -``` - -To confirm that the data look the way we want, we can also (perhaps alternatively) load it directly into R: - -```{r} -constructionData <- foreign::read.dta("constructionData.dta") -str(constructionData) -PESenergy <- utils::read.table("PESenergy.csv") -str(PESenergy) -``` - -In addition to visual inspection, we can compare the UNF signatures for each dataset against what is reported by Dataverse to confirm that we received the correct files: - -```{r} -library("UNF") -unf(constructionData) -unf(PESenergy) -dataset$files[c("label", "UNF")] -``` - -## Reusing Files and Reproducing Analysis - -To reproduce the analysis, we can simply run the code file either as a `system()` call or directly in R using `source()` (note this particular file begins with an `rm()` call so you may want to run it in a clean R session): - -```R -system("Rscript chapter03.R") -source("chapter03.R") -``` - -Any well-produced set of analysis reproduction files, like this one, will easily run without error once the data and code are in-hand. - -```{r, echo = FALSE, results = "hide"} -unlink("constructionData.dta") -unlink("PESenergy.csv") -unlink("chapter03.R") -``` - -To archive your own reproducible analyses using Dataverse, see the ["Archiving Data" vignette](D-archiving.html). diff --git a/vignettes/D-archiving.Rmd b/vignettes/D-archiving.Rmd index ac093e1..e60d968 100644 --- a/vignettes/D-archiving.Rmd +++ b/vignettes/D-archiving.Rmd @@ -1,6 +1,6 @@ --- title: "Data Archiving" -date: "2017-06-15" +date: "`r Sys.Date()`" output: html_document: fig_caption: false @@ -17,8 +17,7 @@ vignette: > This vignette describes how to archive data into Dataverse directly from R. - -```r +```{r} library("dataverse") Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") ``` diff --git a/vignettes/D-archiving.Rmd2 b/vignettes/D-archiving.Rmd2 deleted file mode 100644 index e60d968..0000000 --- a/vignettes/D-archiving.Rmd2 +++ /dev/null @@ -1,81 +0,0 @@ ---- -title: "Data Archiving" -date: "`r Sys.Date()`" -output: - html_document: - fig_caption: false - toc: true - toc_float: - collapsed: false - smooth_scroll: false - toc_depth: 2 -vignette: > - %\VignetteIndexEntry{4. Data Archiving} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -This vignette describes how to archive data into Dataverse directly from R. - -```{r} -library("dataverse") -Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") -``` - -## SWORD-based Data Archiving - -The main data archiving (or "deposit") workflow for Dataverse is built on [SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a new dataset listing, you will have to first initialize a dataset entry with some metadata, add one or more files to the dataset, and then publish it. This looks something like the following: - -```R -# retrieve your service document -d <- service_document() - -# list current datasets in a dataverse -list_datasets("mydataverse") - -# create a new dataset -## create a list of metadata -metadat <- list(title = "My Study", - creator = "Doe, John", - description = "An example study") -## initiate the dataset -dat <- initiate_sword_dataset("mydataverse", body = metadat) -``` - -Once the dataset is initiated, it is possible to add and delete files: - -```R -tmp <- tempfile() -write.csv(iris, file = tmp) -f <- add_file(dat, file = tmp) -``` - -The `add_file()` function accepts, as its first argument, a character vector of file names, a data.frame, or a list of R objects. Files can be deleted using `delete_file()`. Once the dataset is finalized, it can be published using `publish_dataset()`: - -```R -publish_dataset(dat) -``` - -And it will then show up in the list of published datasets returned by `list_datasets(dat)`. - -## Native API - -Dataverse also implements a second way to release datasets, called the "native" API. It is similar to to the SWORD API: - -```R -# create the dataset -ds <- create_dataset("mydataverse") - -# add files -tmp <- tempfile() -write.csv(iris, file = tmp) -f <- add_dataset_file(file = tmp, dataset = ds) - -# publish dataset -publish_dataset(ds) - -# dataset will now be published -get_dataverse("mydataverse") -``` - - From b8f02aea97fc02325283bb94c53129f08323ae71 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Tue, 23 Feb 2021 17:47:28 -0500 Subject: [PATCH 02/16] Modernize format of chunks --- vignettes/A-introduction.Rmd | 7 +------ vignettes/B-search.Rmd | 7 +------ vignettes/C-retrieval.Rmd | 9 ++------- vignettes/D-archiving.Rmd | 15 +++++---------- 4 files changed, 9 insertions(+), 29 deletions(-) diff --git a/vignettes/A-introduction.Rmd b/vignettes/A-introduction.Rmd index 63f630d..65567db 100644 --- a/vignettes/A-introduction.Rmd +++ b/vignettes/A-introduction.Rmd @@ -2,13 +2,8 @@ title: "Introduction to Dataverse" date: "`r Sys.Date()`" output: - html_document: + html_vignette: fig_caption: false - toc: true - toc_float: - collapsed: false - smooth_scroll: false - toc_depth: 2 vignette: > %\VignetteIndexEntry{1. Introduction to Dataverse} %\VignetteEngine{knitr::rmarkdown} diff --git a/vignettes/B-search.Rmd b/vignettes/B-search.Rmd index bdfbfb7..42cfe14 100644 --- a/vignettes/B-search.Rmd +++ b/vignettes/B-search.Rmd @@ -2,13 +2,8 @@ title: "Data Search and Discovery" date: "`r Sys.Date()`" output: - html_document: + html_vignette: fig_caption: false - toc: true - toc_float: - collapsed: false - smooth_scroll: false - toc_depth: 2 vignette: > %\VignetteIndexEntry{2. Data Search and Discovery} %\VignetteEngine{knitr::rmarkdown} diff --git a/vignettes/C-retrieval.Rmd b/vignettes/C-retrieval.Rmd index 493b2cd..87ec7be 100644 --- a/vignettes/C-retrieval.Rmd +++ b/vignettes/C-retrieval.Rmd @@ -2,13 +2,8 @@ title: "Data Retrieval and Reuse" date: "`r Sys.Date()`" output: - html_document: + html_vignette: fig_caption: false - toc: true - toc_float: - collapsed: false - smooth_scroll: false - toc_depth: 2 vignette: > %\VignetteIndexEntry{3. Data Retrieval and Reuse} %\VignetteEngine{knitr::rmarkdown} @@ -95,7 +90,7 @@ dataset$files[c("label", "UNF")] To reproduce the analysis, we can simply run the code file either as a `system()` call or directly in R using `source()` (note this particular file begins with an `rm()` call so you may want to run it in a clean R session): -```R +```{r, eval = FALSE} system("Rscript chapter03.R") source("chapter03.R") ``` diff --git a/vignettes/D-archiving.Rmd b/vignettes/D-archiving.Rmd index e60d968..66d6ee2 100644 --- a/vignettes/D-archiving.Rmd +++ b/vignettes/D-archiving.Rmd @@ -2,13 +2,8 @@ title: "Data Archiving" date: "`r Sys.Date()`" output: - html_document: + html_vignette: fig_caption: false - toc: true - toc_float: - collapsed: false - smooth_scroll: false - toc_depth: 2 vignette: > %\VignetteIndexEntry{4. Data Archiving} %\VignetteEngine{knitr::rmarkdown} @@ -26,7 +21,7 @@ Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") The main data archiving (or "deposit") workflow for Dataverse is built on [SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a new dataset listing, you will have to first initialize a dataset entry with some metadata, add one or more files to the dataset, and then publish it. This looks something like the following: -```R +```{r, eval = FALSE} # retrieve your service document d <- service_document() @@ -44,7 +39,7 @@ dat <- initiate_sword_dataset("mydataverse", body = metadat) Once the dataset is initiated, it is possible to add and delete files: -```R +```{r, eval = FALSE} tmp <- tempfile() write.csv(iris, file = tmp) f <- add_file(dat, file = tmp) @@ -52,7 +47,7 @@ f <- add_file(dat, file = tmp) The `add_file()` function accepts, as its first argument, a character vector of file names, a data.frame, or a list of R objects. Files can be deleted using `delete_file()`. Once the dataset is finalized, it can be published using `publish_dataset()`: -```R +```{r, eval = FALSE} publish_dataset(dat) ``` @@ -62,7 +57,7 @@ And it will then show up in the list of published datasets returned by `list_dat Dataverse also implements a second way to release datasets, called the "native" API. It is similar to to the SWORD API: -```R +```{r, eval = FALSE} # create the dataset ds <- create_dataset("mydataverse") From e6c96366ddf89e71ca1485c0ebd6d50d0e847c6a Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Thu, 25 Feb 2021 21:22:44 -0500 Subject: [PATCH 03/16] Make title consistent with index --- vignettes/A-introduction.Rmd | 2 +- vignettes/B-search.Rmd | 4 ++-- vignettes/C-retrieval.Rmd | 2 +- vignettes/D-archiving.Rmd | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/vignettes/A-introduction.Rmd b/vignettes/A-introduction.Rmd index 65567db..9c2685c 100644 --- a/vignettes/A-introduction.Rmd +++ b/vignettes/A-introduction.Rmd @@ -1,5 +1,5 @@ --- -title: "Introduction to Dataverse" +title: "1. Introduction to Dataverse" date: "`r Sys.Date()`" output: html_vignette: diff --git a/vignettes/B-search.Rmd b/vignettes/B-search.Rmd index 42cfe14..906e3e7 100644 --- a/vignettes/B-search.Rmd +++ b/vignettes/B-search.Rmd @@ -1,11 +1,11 @@ --- -title: "Data Search and Discovery" +title: "2. Data Search and Discovery" date: "`r Sys.Date()`" output: html_vignette: fig_caption: false vignette: > - %\VignetteIndexEntry{2. Data Search and Discovery} + %\VignetteIndexEntry{3. Data Search and Discovery} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- diff --git a/vignettes/C-retrieval.Rmd b/vignettes/C-retrieval.Rmd index 87ec7be..fadcc13 100644 --- a/vignettes/C-retrieval.Rmd +++ b/vignettes/C-retrieval.Rmd @@ -1,5 +1,5 @@ --- -title: "Data Retrieval and Reuse" +title: "3. Data Retrieval and Reuse" date: "`r Sys.Date()`" output: html_vignette: diff --git a/vignettes/D-archiving.Rmd b/vignettes/D-archiving.Rmd index 66d6ee2..3bc87e6 100644 --- a/vignettes/D-archiving.Rmd +++ b/vignettes/D-archiving.Rmd @@ -1,5 +1,5 @@ --- -title: "Data Archiving" +title: "4. Data Archiving" date: "`r Sys.Date()`" output: html_vignette: From c705692bf85dcec960237e58121fd98ca5e079a3 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Thu, 25 Feb 2021 21:23:38 -0500 Subject: [PATCH 04/16] Delete archiving/upload part for now --- vignettes/D-archiving.Rmd | 76 --------------------------------------- 1 file changed, 76 deletions(-) delete mode 100644 vignettes/D-archiving.Rmd diff --git a/vignettes/D-archiving.Rmd b/vignettes/D-archiving.Rmd deleted file mode 100644 index 3bc87e6..0000000 --- a/vignettes/D-archiving.Rmd +++ /dev/null @@ -1,76 +0,0 @@ ---- -title: "4. Data Archiving" -date: "`r Sys.Date()`" -output: - html_vignette: - fig_caption: false -vignette: > - %\VignetteIndexEntry{4. Data Archiving} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -This vignette describes how to archive data into Dataverse directly from R. - -```{r} -library("dataverse") -Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") -``` - -## SWORD-based Data Archiving - -The main data archiving (or "deposit") workflow for Dataverse is built on [SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a new dataset listing, you will have to first initialize a dataset entry with some metadata, add one or more files to the dataset, and then publish it. This looks something like the following: - -```{r, eval = FALSE} -# retrieve your service document -d <- service_document() - -# list current datasets in a dataverse -list_datasets("mydataverse") - -# create a new dataset -## create a list of metadata -metadat <- list(title = "My Study", - creator = "Doe, John", - description = "An example study") -## initiate the dataset -dat <- initiate_sword_dataset("mydataverse", body = metadat) -``` - -Once the dataset is initiated, it is possible to add and delete files: - -```{r, eval = FALSE} -tmp <- tempfile() -write.csv(iris, file = tmp) -f <- add_file(dat, file = tmp) -``` - -The `add_file()` function accepts, as its first argument, a character vector of file names, a data.frame, or a list of R objects. Files can be deleted using `delete_file()`. Once the dataset is finalized, it can be published using `publish_dataset()`: - -```{r, eval = FALSE} -publish_dataset(dat) -``` - -And it will then show up in the list of published datasets returned by `list_datasets(dat)`. - -## Native API - -Dataverse also implements a second way to release datasets, called the "native" API. It is similar to to the SWORD API: - -```{r, eval = FALSE} -# create the dataset -ds <- create_dataset("mydataverse") - -# add files -tmp <- tempfile() -write.csv(iris, file = tmp) -f <- add_dataset_file(file = tmp, dataset = ds) - -# publish dataset -publish_dataset(ds) - -# dataset will now be published -get_dataverse("mydataverse") -``` - - From 6098fd72c57e0dfb35908cb333591f4f3b18671d Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Thu, 25 Feb 2021 21:36:37 -0500 Subject: [PATCH 05/16] Slim down intro, add sections --- vignettes/A-introduction.Rmd | 78 ++++++++++++++---------------------- 1 file changed, 29 insertions(+), 49 deletions(-) diff --git a/vignettes/A-introduction.Rmd b/vignettes/A-introduction.Rmd index 9c2685c..3ed0d11 100644 --- a/vignettes/A-introduction.Rmd +++ b/vignettes/A-introduction.Rmd @@ -12,27 +12,24 @@ vignette: > The **dataverse** package is the official R client for [Dataverse 4](https://dataverse.org/) data repositories. The package enables data search, retrieval, and deposit with any Dataverse installation, thus allowing R users to integrate public data sharing into the reproducible research workflow. -In addition to this introduction, the package contains three additional vignettes covering: +In addition to this introduction, the package contains additional vignettes covering: * ["Data Search and Discovery"](B-search.html) * ["Data Retrieval and Reuse"](C-retrieval.html) - * ["Data Archiving"](D-archiving.html) They can be accessed from [CRAN](https://cran.r-project.org/package=dataverse) or from within R using `vignettes(package = "dataverse")`. + +## Installation + The dataverse client package can be installed from [CRAN](https://cran.r-project.org/package=dataverse), and you can find the latest development version and report any issues on GitHub: ```R -if (!require("remotes")) { - install.packages("remotes") -} remotes::install_github("iqss/dataverse-client-r") library("dataverse") ``` -(Note: **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. See the appendix of this vignette for a cross-walk of functionality between **dvn** and **dataverse**.) - -## Quick Start +## Terminology Dataverse has some terminology that is worth quickly reviewing before showing how to work with Dataverse in R. Dataverse is an application that can be installed in many places. As a result, **dataverse** can work with any installation but you need to specify which installation you want to work with. This can be set by default with an environment variable, `DATAVERSE_SERVER`: @@ -45,69 +42,52 @@ This should be the Dataverse server, without the "https" prefix or the "/api" UR Within a given Dataverse installation, organizations or individuals can create objects that are also called "Dataverses". These Dataverses can then contain other *dataverses*, which can contain other *dataverses*, and so on. They can also contain *datasets* which in turn contain files. You can think of Harvard's Dataverse as a top-level installation, where an institution might have a *dataverse* that contains a subsidiary *dataverse* for each researcher at the organization, who in turn publishes all files relevant to a given study as a *dataset*. +## Search + You can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. For example, to search for data files or datasets that mention "ecological inference", we can just do: ```R dataverse_search("ecological inference")[c("name", "type", "description")] ``` -The [search vignette](B-search.html) describes this functionality in more detail. To retrieve a data file, we need to investigate the dataset being returned and look at what files it contains using a variety of functions, the last of which - `get_file()` - can retrieve the files as raw vectors: +The [search vignette](B-search.html) describes this functionality in more detail. + +## Get + +To retrieve a data file, we need to investigate the dataset being returned and look at what files it contains using a variety of functions: ```R get_dataset() dataset_files() get_file_metadata() get_file() +get_dataframe_by_name() ``` -For "native" Dataverse features (such as user account controls) or to create and publish a dataset, you will need an API key linked to a Dataverse installation account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](https://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using: - -```R -Sys.setenv("DATAVERSE_KEY" = "examplekey12345") -``` - -With that set, you can easily create a new dataverse, create a dataset within that dataverse, push files to the dataset, and release it: +The most practical of these is likely `get_dataframe_by_name()` which imports the object directly as a dataframe. `get_file()` is more primitive, and calls a raw vector. -```R -# create a dataverse -dat <- create_daverse("mydataverse") +Recall that, because _datasets_ in Dataverse are a collection of files rather than a single csv file, for example, the `get_dataset()` funtion does not return data but rather information about a Dataverse dataset. -# create a list of metadata -metadat <- list(title = "My Study", - creator = "Doe, John", - description = "An example study") +The [retrieval vignette](C-retrieval.html) describes this functionality in more detail. -# create the dataset -dat <- initiate_dataset("mydataverse", body = metadat) +## Upload and Maintain -# add files to dataset -tmp <- tempfile() -write.csv(iris, file = tmp) -f <- add_file(dat, file = tmp) +For "native" Dataverse features (such as user account controls) or to create and publish a dataset, you will need an API key linked to a Dataverse installation account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](https://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using: -# publish new dataset -publish_dataset(dat) +```R +Sys.setenv("DATAVERSE_KEY" = "examplekey12345") ``` -Your data are now publicly accessible. +where `examplekey12345` should be replace with your own key. +With that set, you can easily create a new dataverse, create a dataset within that dataverse, push files to the dataset, and release it, using functions such as -## Appendix: dvn to dataverse Crosswalk +```R +create_daverse() +initiate_dataset() +add_file() +publish_dataset() +``` -The original [dvn](https://cran.r-project.org/?package=dvn) package, which worked with Dataverse versions <= 3, provided functionality for searching, retrieving, and depositing data. Here is a cross-walk of functionality in case you were already familiar with the dvn package: +As of `dataverse` version 0.3.0, we recommended the Python client (`https://github.com/gdcc/pyDataverse`) for these upload and maintenance functions. -| API Category | **dataverse** functions | **dvn** functions | -| ------------ | ----------------------- | ----------------- | -| Data Search | `dataverse_search()` | `dvSearch()` | -| Data Retrieval | `get_file_metadata()` | `dvMetadata()` | -|| `get_file()` | | -| Data Deposit | `create_dataverse()` | | -|| `initiate_dataset()` | `dvCreateStudy()` | -|| `update_dataset()` | `dvEditStudy()` | -|| `add_file()` | `addFile()` | -|| `delete_file()` | `dvDeleteFile()` | -|| `publish_sword_dataset()` | `dvReleaseStudy()` | -|| `delete_sword_dataset()` | | -|| `service_document()` | `dvServiceDoc()` | -|| `dataset_statement()` | `dvStudyStatement()` | -|| `list_datasets()` | `dvUserStudies()` | From d1fdc72a76a3df07fd5e06420092673c37c15045 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Thu, 25 Feb 2021 21:36:48 -0500 Subject: [PATCH 06/16] Make D the other packages section --- vignettes/D-related.Rmd | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 vignettes/D-related.Rmd diff --git a/vignettes/D-related.Rmd b/vignettes/D-related.Rmd new file mode 100644 index 0000000..a553b27 --- /dev/null +++ b/vignettes/D-related.Rmd @@ -0,0 +1,36 @@ +--- +title: "Appendix: Comparisons with other clients" +date: "`r Sys.Date()`" +output: + html_vignette: + fig_caption: false +vignette: > + %\VignetteIndexEntry{Appendix: Comparisons with other clients} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` + +## dvn + +The original [dvn](https://cran.r-project.org/?package=dvn) package, which worked with Dataverse versions <= 3, provided functionality for searching, retrieving, and depositing data. Here is a cross-walk of functionality in case you were already familiar with the dvn package: + +| API Category | **dataverse** functions | **dvn** functions | +| ------------ | ----------------------- | ----------------- | +| Data Search | `dataverse_search()` | `dvSearch()` | +| Data Retrieval | `get_file_metadata()` | `dvMetadata()` | +|| `get_file()` | | +| Data Deposit | `create_dataverse()` | | +|| `initiate_dataset()` | `dvCreateStudy()` | +|| `update_dataset()` | `dvEditStudy()` | +|| `add_file()` | `addFile()` | +|| `delete_file()` | `dvDeleteFile()` | +|| `publish_sword_dataset()` | `dvReleaseStudy()` | +|| `delete_sword_dataset()` | | +|| `service_document()` | `dvServiceDoc()` | +|| `dataset_statement()` | `dvStudyStatement()` | +|| `list_datasets()` | `dvUserStudies()` | + From 7a47975fca92cbc557a70bfa57fc3cccd726607e Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Sat, 27 Feb 2021 11:06:51 -0500 Subject: [PATCH 07/16] No need to talk about reuse --- vignettes/A-introduction.Rmd | 2 +- vignettes/C-retrieval.Rmd | 23 ++--------------------- 2 files changed, 3 insertions(+), 22 deletions(-) diff --git a/vignettes/A-introduction.Rmd b/vignettes/A-introduction.Rmd index 3ed0d11..78182cb 100644 --- a/vignettes/A-introduction.Rmd +++ b/vignettes/A-introduction.Rmd @@ -15,7 +15,7 @@ The **dataverse** package is the official R client for [Dataverse 4](https://dat In addition to this introduction, the package contains additional vignettes covering: * ["Data Search and Discovery"](B-search.html) - * ["Data Retrieval and Reuse"](C-retrieval.html) + * ["Data Retrieval"](C-retrieval.html) They can be accessed from [CRAN](https://cran.r-project.org/package=dataverse) or from within R using `vignettes(package = "dataverse")`. diff --git a/vignettes/C-retrieval.Rmd b/vignettes/C-retrieval.Rmd index fadcc13..a08e478 100644 --- a/vignettes/C-retrieval.Rmd +++ b/vignettes/C-retrieval.Rmd @@ -1,11 +1,11 @@ --- -title: "3. Data Retrieval and Reuse" +title: "3. Data Retrieval" date: "`r Sys.Date()`" output: html_vignette: fig_caption: false vignette: > - %\VignetteIndexEntry{3. Data Retrieval and Reuse} + %\VignetteIndexEntry{3. Data Retrieval} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- @@ -85,22 +85,3 @@ unf(constructionData) unf(PESenergy) dataset$files[c("label", "UNF")] ``` - -## Reusing Files and Reproducing Analysis - -To reproduce the analysis, we can simply run the code file either as a `system()` call or directly in R using `source()` (note this particular file begins with an `rm()` call so you may want to run it in a clean R session): - -```{r, eval = FALSE} -system("Rscript chapter03.R") -source("chapter03.R") -``` - -Any well-produced set of analysis reproduction files, like this one, will easily run without error once the data and code are in-hand. - -```{r, echo = FALSE, results = "hide"} -unlink("constructionData.dta") -unlink("PESenergy.csv") -unlink("chapter03.R") -``` - -To archive your own reproducible analyses using Dataverse, see the ["Archiving Data" vignette](D-archiving.html). From 36fdb3719520d6520846517313eb49034690c609 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Sat, 27 Feb 2021 12:14:45 -0500 Subject: [PATCH 08/16] Write out vignette for get_dataframe_* Focus on the dataframe side and cut tangential material --- vignettes/C-retrieval.Rmd | 104 ++++++++++++++++++++++++++------------ 1 file changed, 73 insertions(+), 31 deletions(-) diff --git a/vignettes/C-retrieval.Rmd b/vignettes/C-retrieval.Rmd index a08e478..79ba39a 100644 --- a/vignettes/C-retrieval.Rmd +++ b/vignettes/C-retrieval.Rmd @@ -15,73 +15,115 @@ options(width = 120) knitr::opts_chunk$set(results = "hold") ``` -This vignette shows how to download data from Dataverse using the dataverse package. We'll focus on a Dataverse repository that contains supplemental files for [*Political Analysis Using R*](https://www.springer.com/gb/book/9783319234458), which is stored at Harvard University's [IQSS Dataverse Network](https://dataverse.harvard.edu/): +This vignette shows how to download data from Dataverse using the dataverse package. We'll focus on a Dataverse repository that contains supplemental files for [*Political Analysis Using R*](https://www.springer.com/gb/book/9783319234458), which is stored at Harvard's Dataverse Server . -> Monogan, Jamie, 2015, "Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems", [doi:10.7910/DVN/ARKOTI](https://doi.org/10.7910/DVN/ARKOTI), Harvard Dataverse, V1, UNF:6:+itU9hcUJ8I9E0Kqv8HWHg== +The Dataverse entry for this study is persistently retrievable by a "Digital Object Identifier (DOI)": https://doi.org/10.7910/DVN/ARKOTI and the citation on the Dataverse Page includes a "[Universal Numeric Fingerprint (UNF)](https://guides.dataverse.org/en/latest/developers/unf/index.html)": `UNF:6:+itU9hcUJ8I9E0Kqv8HWHg==`, which provides a versioned, multi-file hash for the entire study, which contains 32 files. -This study is persistently retrievable by a "[Digital Object Identifier (DOI)](https://www.doi.org/)": https://doi.org/10.7910/DVN/ARKOTI and the citation above (taken from the Dataverse page) includes a "[Universal Numeric Fingerprint (UNF)](https://guides.dataverse.org/en/latest/developers/unf/index.html)": `UNF:6:+itU9hcUJ8I9E0Kqv8HWHg==`, which provides a versioned, multi-file hash for the entire study, which contains 32 files. -If you don't already know what datasets and files you want to use from Dataverse, see the ["Data Search" vignette](B-search.html) for guidance on data search and discovery. - -## Retrieving Dataset and File Metadata +## Retrieving Metadata We will download these files and examine them directly in R using the **dataverse** package. To begin, we need to loading the package and using the `get_dataset()` function to retrieve some basic metadata about the dataset: ```{r} library("dataverse") Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") -(dataset <- get_dataset("doi:10.7910/DVN/ARKOTI")) ``` -The output prints some basic metadata and then the `str()` of the `files` data frame returned by the call. This lists all of the files in the dataset along with a considerable amount of metadata about each. We can see a quick glance at these files using: +The `get_dataset()` function lists all of the files in the dataset along with a considerable amount of metadata about each. (Recall that in Dataverse, `dataset` is a collection of files, not a single file.) We can see a quick glance at these files using: -``` +```r +dataset <- get_dataset("doi:10.7910/DVN/ARKOTI") dataset$files[c("filename", "contentType")] ``` This shows that there are indeed 32 files, a mix of .R code files and tab- and comma-separated data files. - You can also retrieve more extensive metadata using `dataset_metadata()`: ```{r} -str(dataset_metadata("doi:10.7910/DVN/ARKOTI"), 1) +str(dataset_metadata("doi:10.7910/DVN/ARKOTI"), 2) ``` -We'll focus here on the code and data files for Chapter 2 from the book. -## Retrieving Files -Let's start by grabbing the code using `get_file()` (note that this always returns a raw vector): + +## Retrieving Plain-Text Data + +Now we'll get the corresponding data files. First, we retrieve a plain-text file like this dataset on electricity consumption by [Wakiyama et al. (2014)](https://doi.org/10.7910/DVN/ARKOTI/GN1MRT). Taking the file name and dataset DOI from this entry, + +```{r, messages=FALSE} +energy <- get_dataframe_by_name( + "comprehensiveJapanEnergy.tab", + "doi:10.7910/DVN/ARKOTI") + +head(energy) +``` + +These `get_dataframe_*` functions, introduced in v0.3.0, directly read in the data into a R environment through whatever R function supplied by `.f`. The default of the `get_dataframe_*` functions is to read in such data by `readr::read_tsv`. The `.f` function can be modified to modify the read-in settings. For example, this modification is a base-R equivalent way to read in the raw data. + ```{r} -code3 <- get_file("chapter03.R", "doi:10.7910/DVN/ARKOTI") -writeBin(code3, "chapter03.R") +library(readr) +energy <- get_dataframe_by_name( + "comprehensiveJapanEnergy.tab", + "doi:10.7910/DVN/ARKOTI", + .f = function(x) read.delim(x, sep = "\t")) + +head(energy) ``` -Now we'll get the corresponding data and save it locally. For this code we need two data files: + + +## Retrieving Custom Data Fromats (RDS, Stata, SPSS) + + +If a file is displayed on dataverse as a `.tab` file like the survey data by [Alvarez et al. (2013)](https://doi.org/10.7910/DVN/ARKOTI/A8YRMP), it is likely that Dataverse [ingested](https://guides.dataverse.org/en/latest/user/tabulardataingest/index.html) the file to a plain-text, tab-delimited format. + +```{r, message=FALSE} +argentina_tab <- get_dataframe_by_name( + "alpl2013.tab", + "doi:10.7910/DVN/ARKOTI") +``` + + +However, ingested files may not retain the dataset attributes one wants. For example, Stata and SPSS datasets encode value labels into numeric values which are critical for understanding the data. Factor variables in R dataframes encode levels, not only labels. A plain-text ingested file will discard such information. For example, the `polling_place` variable in this data is only given by numbers, although the original data labelled these numbers with informative values. ```{r} -writeBin(get_file("constructionData.tab", "doi:10.7910/DVN/ARKOTI"), - "constructionData.dta") -writeBin(get_file("PESenergy.csv", "doi:10.7910/DVN/ARKOTI"), - "PESenergy.csv") +str(argentina_tab$polling_place) ``` -To confirm that the data look the way we want, we can also (perhaps alternatively) load it directly into R: +For such files Dataverse retains a `original` version, which keeps the original labels but may not be readable in some platforms. The `get_dataframe_*` functions have an argument that can be set to `original = TRUE`. In this case we know that `alpl2013.tab` was originally a Stata dta file, so we can run: ```{r} -constructionData <- foreign::read.dta("constructionData.dta") -str(constructionData) -PESenergy <- utils::read.table("PESenergy.csv") -str(PESenergy) +argentina_dta <- get_dataframe_by_name( + "alpl2013.tab", + "doi:10.7910/DVN/ARKOTI", + original = TRUE, + .f = haven::read_dta) ``` -In addition to visual inspection, we can compare the UNF signatures for each dataset against what is reported by Dataverse to confirm that we received the correct files: +Now we see that labels are read in through `haven`'s labelled variables class: ```{r} -library("UNF") -unf(constructionData) -unf(PESenergy) -dataset$files[c("label", "UNF")] +str(argentina_dta$polling_place) +``` + + + +Users should pick `.f` and `original` based on their existing knowledge of the file. If the original file is a `.sav` SPSS file, `.f` can be `haven::read_sav`. If it is a `.Rds` file, use `readRDS` or `readr::read_rds`. In fact, because the raw data is read in as a binary, there is no limitation to the file types `get_dataframe_*` can read in, as far as the dataverse package is concerned. + +There are two more ways to read in a dataframe other than `get_dataframe_by_name()`. `get_dataframe_by_doi()` takes in a file-specific DOI if Dataverse contains one such as . This removes the necessity for users to set the `dataset` argument. `get_dataframe_by_id()` takes a numeric Dataverse identification number. This identifier is an internal number and is not prominently featured in the interface. + + +In addition to visual inspection, we can compare the UNF signatures for each dataset against what is reported by Dataverse to confirm that we received the correct files. (See the [UNF package](https://cran.r-project.org/package=UNF) and `unf` function). + + + +## Retrieving Scripts and other files + +If the file you want to retrieve is not data, you may want to use the more primitive function, `get_file`, which gets the file data as a raw binary file. See the help page examples of `get_file` that use the `base::writeBin()` function for details on how to write and read these binary files instead. + +```{r, eval = FALSE} +code3 <- get_file("chapter03.R", "doi:10.7910/DVN/ARKOTI") +writeBin(code3, "chapter03.R") ``` From 99eb02de2ef561137c34e3dd1ddf0c43e12563da Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Sat, 27 Feb 2021 12:16:24 -0500 Subject: [PATCH 09/16] Delete related article. We can later add it back with more text on pyDataverse --- vignettes/D-related.Rmd | 36 ------------------------------------ 1 file changed, 36 deletions(-) delete mode 100644 vignettes/D-related.Rmd diff --git a/vignettes/D-related.Rmd b/vignettes/D-related.Rmd deleted file mode 100644 index a553b27..0000000 --- a/vignettes/D-related.Rmd +++ /dev/null @@ -1,36 +0,0 @@ ---- -title: "Appendix: Comparisons with other clients" -date: "`r Sys.Date()`" -output: - html_vignette: - fig_caption: false -vignette: > - %\VignetteIndexEntry{Appendix: Comparisons with other clients} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE) -``` - -## dvn - -The original [dvn](https://cran.r-project.org/?package=dvn) package, which worked with Dataverse versions <= 3, provided functionality for searching, retrieving, and depositing data. Here is a cross-walk of functionality in case you were already familiar with the dvn package: - -| API Category | **dataverse** functions | **dvn** functions | -| ------------ | ----------------------- | ----------------- | -| Data Search | `dataverse_search()` | `dvSearch()` | -| Data Retrieval | `get_file_metadata()` | `dvMetadata()` | -|| `get_file()` | | -| Data Deposit | `create_dataverse()` | | -|| `initiate_dataset()` | `dvCreateStudy()` | -|| `update_dataset()` | `dvEditStudy()` | -|| `add_file()` | `addFile()` | -|| `delete_file()` | `dvDeleteFile()` | -|| `publish_sword_dataset()` | `dvReleaseStudy()` | -|| `delete_sword_dataset()` | | -|| `service_document()` | `dvServiceDoc()` | -|| `dataset_statement()` | `dvStudyStatement()` | -|| `list_datasets()` | `dvUserStudies()` | - From 42d715a4ac99fc1e8a4c0c1f0e755b937b0e7ff6 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Sat, 27 Feb 2021 12:22:05 -0500 Subject: [PATCH 10/16] Link to pyDataverse, cut material redundant to vignette --- README.Rmd | 57 ++++++---------------------------------- README.md | 76 +++++------------------------------------------------- 2 files changed, 15 insertions(+), 118 deletions(-) diff --git a/README.Rmd b/README.Rmd index f6e59b5..04ca327 100644 --- a/README.Rmd +++ b/README.Rmd @@ -15,6 +15,7 @@ Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") The **dataverse** package provides access to [Dataverse](https://dataverse.org/) APIs (versions 4+), enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting. + ### Getting Started You can find a stable release on [CRAN](https://cran.r-project.org/package=dataverse), or install the latest development version from [GitHub](https://github.com/iqss/dataverse-client-r/): @@ -25,7 +26,7 @@ You can find a stable release on [CRAN](https://cran.r-project.org/package=datav install.packages("dataverse") # Install from GitHub -if (!require("remotes")) install.packages("remotes") +# install.packages("remotes") remotes::install_github("iqss/dataverse-client-r") ``` @@ -41,6 +42,9 @@ Some features of the Dataverse API are public and require no authentication. Thi Sys.setenv("DATAVERSE_KEY" = "examplekey12345") ``` +where `examplekey12345` should be replace with your own key. + + #### Server Because [there are many Dataverse installations](https://dataverse.org/), all functions in the R client require specifying what server installation you are interacting with. This can be set by default with an environment variable, `DATAVERSE_SERVER`. This should be the Dataverse server, without the "https" prefix or the "/api" URL path, etc. For example, the Harvard Dataverse can be used by setting: @@ -53,6 +57,7 @@ Note: The package attempts to compensate for any malformed values, though. Currently, the package wraps the data management features of the Dataverse API. Functions for other API features - related to user management and permissions - are not currently exported in the package (but are drafted in the [source code](https://github.com/IQSS/dataverse-client-r)). + ### Data and Metadata Retrieval The dataverse package provides multiple interfaces to obtain data into R. Users can supply a file DOI, a dataset DOI combined with a filename, or a dataverse object. They can read in the file as a raw binary or a dataset read in with the appropriate R function. @@ -117,54 +122,6 @@ attr(nlsw_original$race, "labels") # original dta has value labels - -#### Reading a dataset as a binary file. - -In some cases, you may not want to read in the data in your environment, perhaps because that is not possible (e.g. for a `.docx` file), and you want to simply write these files your local disk. To do this, use the more primitive `get_file_*` commands. The arguments are equivalent, except we no longer need an `.f` argument - -```{r get_file_by_name} -nlsw_raw <- - get_file_by_name( - filename = "nlsw88.tab", - dataset = "10.70122/FK2/PPIAXE", - server = "demo.dataverse.org" - ) -class(nlsw_raw) -``` - -#### Reading file metadata - -The function `get_file_metadata()` can also be used similarly. This will return a metadata format for ingested tabular files in the `ddi` format. The function `get_dataset()` will retrieve the list of files in a dataset. - -```{r, get_dataset} -get_dataset( - dataset = "10.70122/FK2/PPIAXE", - server = "demo.dataverse.org" -) -``` - -### Data Discovery - -Dataverse supplies a robust search API to discover Dataverses, datasets, and files. The simplest searches simply consist of a query string: - -```{r search1, eval = FALSE} -dataverse_search("Gary King") -``` - -More complicated searches might specify metadata fields: - -```{r search2, eval = FALSE} -dataverse_search(author = "Gary King", title = "Ecological Inference") -``` - -And searches can be restricted to specific types of objects (Dataverse, dataset, or file): - -```{r search3, eval = FALSE} -dataverse_search(author = "Gary King", type = "dataset") -``` - -The results are paginated using `per_page` argument. To retrieve subsequent pages, specify `start`. - ### Data Archiving Dataverse provides two - basically unrelated - workflows for managing (adding, documenting, and publishing) datasets. The first is built on [SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a new dataset listing, you will have to first initialize a dataset entry with some metadata, add one or more files to the dataset, and then publish it. This looks something like the following: @@ -218,4 +175,6 @@ Through the native API it is possible to update a dataset by modifying its metad ### Other Installations +Other dataverse clients include [pyDataverse](https://pydataverse.readthedocs.io/en/latest/) for python and the [Java client](https://github.com/IQSS/dataverse-client-java). + Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik's [OAIHarvester](https://cran.r-project.org/package=OAIHarvester) and Scott Chamberlain's [oai](https://cran.r-project.org/package=oai), which offer metadata download from any web repository that is compliant with the [Open Archives Initiative](http://www.openarchives.org/) standards. Additionally, [rdryad](https://cran.r-project.org/package=rdryad) uses OAIHarvester to interface with [Dryad](https://datadryad.org/stash). The [rfigshare](https://cran.r-project.org/package=rfigshare) package works in a similar spirit to **dataverse** with . diff --git a/README.md b/README.md index e87a769..0fd0323 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ latest development version from install.packages("dataverse") # Install from GitHub -if (!require("remotes")) install.packages("remotes") +# install.packages("remotes") remotes::install_github("iqss/dataverse-client-r") ``` @@ -56,6 +56,8 @@ variable called `DATAVERSE_KEY`. It can be set within R using: Sys.setenv("DATAVERSE_KEY" = "examplekey12345") ``` +where `examplekey12345` should be replace with your own key. + #### Server Because [there are many Dataverse @@ -193,74 +195,6 @@ attr(nlsw_original$race, "labels") # original dta has value labels ## white black other ## 1 2 3 -#### Reading a dataset as a binary file. - -In some cases, you may not want to read in the data in your environment, -perhaps because that is not possible (e.g. for a `.docx` file), and you -want to simply write these files your local disk. To do this, use the -more primitive `get_file_*` commands. The arguments are equivalent, -except we no longer need an `.f` argument - -``` r -nlsw_raw <- - get_file_by_name( - filename = "nlsw88.tab", - dataset = "10.70122/FK2/PPIAXE", - server = "demo.dataverse.org" - ) -class(nlsw_raw) -``` - - ## [1] "raw" - -#### Reading file metadata - -The function `get_file_metadata()` can also be used similarly. This will -return a metadata format for ingested tabular files in the `ddi` format. -The function `get_dataset()` will retrieve the list of files in a -dataset. - -``` r -get_dataset( - dataset = "10.70122/FK2/PPIAXE", - server = "demo.dataverse.org" -) -``` - - ## Dataset (182162): - ## Version: 1.1, RELEASED - ## Release Date: 2020-12-30T00:00:24Z - ## License: CC0 - ## 22 Files: - ## label version id contentType - ## 1 nlsw88_rds-export.rds 1 1734016 application/octet-stream - ## 2 nlsw88.tab 3 1734017 text/tab-separated-values - -### Data Discovery - -Dataverse supplies a robust search API to discover Dataverses, datasets, -and files. The simplest searches simply consist of a query string: - -``` r -dataverse_search("Gary King") -``` - -More complicated searches might specify metadata fields: - -``` r -dataverse_search(author = "Gary King", title = "Ecological Inference") -``` - -And searches can be restricted to specific types of objects (Dataverse, -dataset, or file): - -``` r -dataverse_search(author = "Gary King", type = "dataset") -``` - -The results are paginated using `per_page` argument. To retrieve -subsequent pages, specify `start`. - ### Data Archiving Dataverse provides two - basically unrelated - workflows for managing @@ -323,6 +257,10 @@ its metadata with `update_dataset()` or file contents using ### Other Installations +Other dataverse clients include +[pyDataverse](https://pydataverse.readthedocs.io/en/latest/) for python +and the [Java client](https://github.com/IQSS/dataverse-client-java). + Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik’s [OAIHarvester](https://cran.r-project.org/package=OAIHarvester) and From 41334ed9bc44116f1a67c9d222777afc70f1d63a Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Sat, 27 Feb 2021 12:24:28 -0500 Subject: [PATCH 11/16] Remove UNF dependency As it is no longer used in vignette C, and keeping it causes a NOTE in check(). --- DESCRIPTION | 1 - 1 file changed, 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index 1d9c3dd..4c7e157 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -51,7 +51,6 @@ Suggests: purrr, rmarkdown, testthat, - UNF, yaml Description: Provides access to Dataverse APIs (versions 4-5), enabling data search, retrieval, and deposit. For Dataverse versions <= 3.0, From 4308f5aa31f2cce9deac5b0c3c264569bd2b1261 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Sat, 27 Feb 2021 12:32:20 -0500 Subject: [PATCH 12/16] Rename retrieval to download (seems more intuitive) --- R/dataverse-package.R | 2 +- README.Rmd | 4 ++-- vignettes/A-introduction.Rmd | 2 +- vignettes/B-search.Rmd | 2 +- vignettes/{C-retrieval.Rmd => C-download.Rmd} | 4 ++-- 5 files changed, 7 insertions(+), 7 deletions(-) rename vignettes/{C-retrieval.Rmd => C-download.Rmd} (98%) diff --git a/R/dataverse-package.R b/R/dataverse-package.R index 2adcf71..b409735 100644 --- a/R/dataverse-package.R +++ b/R/dataverse-package.R @@ -10,7 +10,7 @@ #' #' \itemize{ #' \item Search: \code{\link{dataverse_search}} -#' \item Data retrieval: \code{\link{get_dataverse}}, \code{\link{dataverse_contents}}, \code{\link{get_dataset}}, \code{\link{dataset_metadata}}, \code{\link{get_file}} +#' \item Data download: \code{\link{get_dataframe_by_name}}, \code{\link{get_dataverse}}, \code{\link{dataverse_contents}}, \code{\link{get_dataset}}, \code{\link{dataset_metadata}}, \code{\link{get_file}} #' \item Data archiving (SWORD API): \code{\link{service_document}}, \code{\link{list_datasets}}, \code{\link{initiate_sword_dataset}}, \code{\link{delete_sword_dataset}}, \code{\link{publish_sword_dataset}}, \code{\link{add_file}}, \code{\link{delete_file}} #' \item Dataverse management \dQuote{native} API: \code{\link{create_dataverse}}, \code{\link{publish_dataverse}}, \code{\link{delete_dataverse}} #' \item Dataset management \dQuote{native} API: \code{\link{create_dataset}}, \code{\link{update_dataset}}, \code{\link{publish_dataset}}, \code{\link{delete_dataset}}, \code{\link{dataset_files}}, \code{\link{dataset_versions}} diff --git a/README.Rmd b/README.Rmd index 04ca327..0534f4d 100644 --- a/README.Rmd +++ b/README.Rmd @@ -13,7 +13,7 @@ Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") [![Dataverse Project logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png)](https://dataverse.org) -The **dataverse** package provides access to [Dataverse](https://dataverse.org/) APIs (versions 4+), enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting. +The **dataverse** package provides access to [Dataverse](https://dataverse.org/) APIs (versions 4+), enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, download, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting. ### Getting Started @@ -58,7 +58,7 @@ Note: The package attempts to compensate for any malformed values, though. Currently, the package wraps the data management features of the Dataverse API. Functions for other API features - related to user management and permissions - are not currently exported in the package (but are drafted in the [source code](https://github.com/IQSS/dataverse-client-r)). -### Data and Metadata Retrieval +### Data Download The dataverse package provides multiple interfaces to obtain data into R. Users can supply a file DOI, a dataset DOI combined with a filename, or a dataverse object. They can read in the file as a raw binary or a dataset read in with the appropriate R function. diff --git a/vignettes/A-introduction.Rmd b/vignettes/A-introduction.Rmd index 78182cb..fae70e6 100644 --- a/vignettes/A-introduction.Rmd +++ b/vignettes/A-introduction.Rmd @@ -68,7 +68,7 @@ The most practical of these is likely `get_dataframe_by_name()` which imports th Recall that, because _datasets_ in Dataverse are a collection of files rather than a single csv file, for example, the `get_dataset()` funtion does not return data but rather information about a Dataverse dataset. -The [retrieval vignette](C-retrieval.html) describes this functionality in more detail. +The [download vignette](C-download.html) describes this functionality in more detail. ## Upload and Maintain diff --git a/vignettes/B-search.Rmd b/vignettes/B-search.Rmd index 906e3e7..be5d630 100644 --- a/vignettes/B-search.Rmd +++ b/vignettes/B-search.Rmd @@ -39,4 +39,4 @@ names(ei) ei$name ``` -Once datasets and files are identified, it is easy to download and use them directly in R. See the ["Data Retrieval" vignette](C-retrieval.html) for details. +Once datasets and files are identified, it is easy to download and use them directly in R. See the ["Data Download" vignette](C-download.html) for details. diff --git a/vignettes/C-retrieval.Rmd b/vignettes/C-download.Rmd similarity index 98% rename from vignettes/C-retrieval.Rmd rename to vignettes/C-download.Rmd index 79ba39a..5c3f2f3 100644 --- a/vignettes/C-retrieval.Rmd +++ b/vignettes/C-download.Rmd @@ -1,11 +1,11 @@ --- -title: "3. Data Retrieval" +title: "3. Data Download" date: "`r Sys.Date()`" output: html_vignette: fig_caption: false vignette: > - %\VignetteIndexEntry{3. Data Retrieval} + %\VignetteIndexEntry{3. Data Download} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- From 98c6b2e4b0bbb1f00309d10039b96a966812423a Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Sat, 27 Feb 2021 12:33:15 -0500 Subject: [PATCH 13/16] Remove mentions of sword and UNF. Can't find sword package anymore, and UNF is not used together and has not been updated recently. --- README.Rmd | 2 +- README.md | 6 ++---- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/README.Rmd b/README.Rmd index 0534f4d..4ea3c86 100644 --- a/README.Rmd +++ b/README.Rmd @@ -13,7 +13,7 @@ Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") [![Dataverse Project logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png)](https://dataverse.org) -The **dataverse** package provides access to [Dataverse](https://dataverse.org/) APIs (versions 4+), enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, download, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting. +The **dataverse** package provides access to [Dataverse](https://dataverse.org/) APIs (versions 4+), enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, download, and deposit. ### Getting Started diff --git a/README.md b/README.md index 0fd0323..d8b9a3b 100644 --- a/README.md +++ b/README.md @@ -18,9 +18,7 @@ public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 (“Dataverse Network”) applications. **dataverse** includes -numerous improvements for data search, retrieval, and deposit, including -use of the (currently in development) **sword** package for data deposit -and the **UNF** package for data fingerprinting. +numerous improvements for data search, download, and deposit. ### Getting Started @@ -81,7 +79,7 @@ management and permissions - are not currently exported in the package (but are drafted in the [source code](https://github.com/IQSS/dataverse-client-r)). -### Data and Metadata Retrieval +### Data Download The dataverse package provides multiple interfaces to obtain data into R. Users can supply a file DOI, a dataset DOI combined with a filename, From 5c9c209f086fb2064aff61c087aa78182286d48e Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Sat, 27 Feb 2021 12:40:00 -0500 Subject: [PATCH 14/16] Forgot to document this change --- man/dataverse.Rd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/man/dataverse.Rd b/man/dataverse.Rd index 324421a..16303e4 100644 --- a/man/dataverse.Rd +++ b/man/dataverse.Rd @@ -16,7 +16,7 @@ This package provides five main sets of functions to interact with Dataverse: \itemize{ \item Search: \code{\link{dataverse_search}} -\item Data retrieval: \code{\link{get_dataverse}}, \code{\link{dataverse_contents}}, \code{\link{get_dataset}}, \code{\link{dataset_metadata}}, \code{\link{get_file}} +\item Data download: \code{\link{get_dataframe_by_name}}, \code{\link{get_dataverse}}, \code{\link{dataverse_contents}}, \code{\link{get_dataset}}, \code{\link{dataset_metadata}}, \code{\link{get_file}} \item Data archiving (SWORD API): \code{\link{service_document}}, \code{\link{list_datasets}}, \code{\link{initiate_sword_dataset}}, \code{\link{delete_sword_dataset}}, \code{\link{publish_sword_dataset}}, \code{\link{add_file}}, \code{\link{delete_file}} \item Dataverse management \dQuote{native} API: \code{\link{create_dataverse}}, \code{\link{publish_dataverse}}, \code{\link{delete_dataverse}} \item Dataset management \dQuote{native} API: \code{\link{create_dataset}}, \code{\link{update_dataset}}, \code{\link{publish_dataset}}, \code{\link{delete_dataset}}, \code{\link{dataset_files}}, \code{\link{dataset_versions}} From bf3d7a011e5f40eac8e07fcfeaad060e6c5bb15a Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Sat, 27 Feb 2021 12:52:17 -0500 Subject: [PATCH 15/16] Update pkgdown header --- _pkgdown.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/_pkgdown.yml b/_pkgdown.yml index ff9f748..63c28a7 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -27,8 +27,7 @@ articles: contents: - 'A-introduction' - 'B-search' - - 'C-retrieval' - - 'D-archiving' + - 'C-download' reference: - title: "Retrieve" From a2483e7a26770afc5f87833f15cd0972445ab571 Mon Sep 17 00:00:00 2001 From: Shiro Kuriwaki Date: Tue, 2 Mar 2021 12:12:32 -0500 Subject: [PATCH 16/16] Use example from @Danny-dK https://github.com/IQSS/dataverse-client-r/issues/82#issuecomment-788738907 --- README.Rmd | 31 ++++++++++++++++++++----------- README.md | 31 ++++++++++++++++++++----------- 2 files changed, 40 insertions(+), 22 deletions(-) diff --git a/README.Rmd b/README.Rmd index 4ea3c86..8a0b42d 100644 --- a/README.Rmd +++ b/README.Rmd @@ -127,30 +127,36 @@ attr(nlsw_original$race, "labels") # original dta has value labels Dataverse provides two - basically unrelated - workflows for managing (adding, documenting, and publishing) datasets. The first is built on [SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a new dataset listing, you will have to first initialize a dataset entry with some metadata, add one or more files to the dataset, and then publish it. This looks something like the following: ``` r -# retrieve your service document +# After setting appropriate dataverse server and environment, obtain SWORD +# service doc d <- service_document() -# create a list of metadata +# create a list of metadata for a file metadat <- list( - title = "My Study", + title = paste0("My-Study_", format(Sys.time(), '%Y-%m-%d_%H:%M')), creator = "Doe, John", description = "An example study" ) -# create the dataset -ds <- initiate_sword_dataset("mydataverse", body = metadat) +# create the dataset, where "mydataverse" is to be replaced by the name +# of the already-created dataverse as shown in the URL +ds <- initiate_sword_dataset("", body = metadat) # add files to dataset -tmp <- tempfile() -write.csv(iris, file = tmp) -f <- add_file(ds, file = tmp) +readr::write_csv(iris, file = "iris.csv") + +# Search the initiated dataset and give a DOI and version of the dataverse as an identifier +mydoi <- "doi:10.70122/FK2/BMZPJZ&version=DRAFT" + +# add dataset +add_dataset_file(file = "iris.csv", dataset = mydoi) # publish new dataset publish_sword_dataset(ds) # dataset will now be published -list_datasets("mydataverse") +list_datasets("") ``` The second workflow is called the "native" API and is similar but uses slightly different functions: @@ -173,8 +179,11 @@ get_dataverse("mydataverse") Through the native API it is possible to update a dataset by modifying its metadata with `update_dataset()` or file contents using `update_dataset_file()` and then republish a new version using `publish_dataset()`. -### Other Installations +For more extensive features of updating and maintaining data, see [pyDataverse](https://pydataverse.readthedocs.io/en/latest/). + + +### Related Software -Other dataverse clients include [pyDataverse](https://pydataverse.readthedocs.io/en/latest/) for python and the [Java client](https://github.com/IQSS/dataverse-client-java). +Other dataverse clients include [pyDataverse](https://pydataverse.readthedocs.io/en/latest/) for Python and the [Java client](https://github.com/IQSS/dataverse-client-java). Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik's [OAIHarvester](https://cran.r-project.org/package=OAIHarvester) and Scott Chamberlain's [oai](https://cran.r-project.org/package=oai), which offer metadata download from any web repository that is compliant with the [Open Archives Initiative](http://www.openarchives.org/) standards. Additionally, [rdryad](https://cran.r-project.org/package=rdryad) uses OAIHarvester to interface with [Dryad](https://datadryad.org/stash). The [rfigshare](https://cran.r-project.org/package=rfigshare) package works in a similar spirit to **dataverse** with . diff --git a/README.md b/README.md index d8b9a3b..43285f2 100644 --- a/README.md +++ b/README.md @@ -203,30 +203,36 @@ with some metadata, add one or more files to the dataset, and then publish it. This looks something like the following: ``` r -# retrieve your service document +# After setting appropriate dataverse server and environment, obtain SWORD +# service doc d <- service_document() -# create a list of metadata +# create a list of metadata for a file metadat <- list( - title = "My Study", + title = paste0("My-Study_", format(Sys.time(), '%Y-%m-%d_%H:%M')), creator = "Doe, John", description = "An example study" ) -# create the dataset -ds <- initiate_sword_dataset("mydataverse", body = metadat) +# create the dataset, where "mydataverse" is to be replaced by the name +# of the already-created dataverse as shown in the URL +ds <- initiate_sword_dataset("", body = metadat) # add files to dataset -tmp <- tempfile() -write.csv(iris, file = tmp) -f <- add_file(ds, file = tmp) +readr::write_csv(iris, file = "iris.csv") + +# Search the initiated dataset and give a DOI and version of the dataverse as an identifier +mydoi <- "doi:10.70122/FK2/BMZPJZ&version=DRAFT" + +# add dataset +add_dataset_file(file = "iris.csv", dataset = mydoi) # publish new dataset publish_sword_dataset(ds) # dataset will now be published -list_datasets("mydataverse") +list_datasets("") ``` The second workflow is called the “native” API and is similar but uses @@ -253,10 +259,13 @@ its metadata with `update_dataset()` or file contents using `update_dataset_file()` and then republish a new version using `publish_dataset()`. -### Other Installations +For more extensive features of updating and maintaining data, see +[pyDataverse](https://pydataverse.readthedocs.io/en/latest/). + +### Related Software Other dataverse clients include -[pyDataverse](https://pydataverse.readthedocs.io/en/latest/) for python +[pyDataverse](https://pydataverse.readthedocs.io/en/latest/) for Python and the [Java client](https://github.com/IQSS/dataverse-client-java). Users interested in downloading metadata from archives other than