Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and update vignettes #90

Merged
merged 18 commits into from
Mar 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ Suggests:
purrr,
rmarkdown,
testthat,
UNF,
yaml
Description: Provides access to Dataverse APIs <https://dataverse.org/> (versions 4-5),
enabling data search, retrieval, and deposit. For Dataverse versions <= 3.0,
Expand Down
2 changes: 1 addition & 1 deletion R/dataverse-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#'
#' \itemize{
#' \item Search: \code{\link{dataverse_search}}
#' \item Data retrieval: \code{\link{get_dataverse}}, \code{\link{dataverse_contents}}, \code{\link{get_dataset}}, \code{\link{dataset_metadata}}, \code{\link{get_file}}
#' \item Data download: \code{\link{get_dataframe_by_name}}, \code{\link{get_dataverse}}, \code{\link{dataverse_contents}}, \code{\link{get_dataset}}, \code{\link{dataset_metadata}}, \code{\link{get_file}}
#' \item Data archiving (SWORD API): \code{\link{service_document}}, \code{\link{list_datasets}}, \code{\link{initiate_sword_dataset}}, \code{\link{delete_sword_dataset}}, \code{\link{publish_sword_dataset}}, \code{\link{add_file}}, \code{\link{delete_file}}
#' \item Dataverse management \dQuote{native} API: \code{\link{create_dataverse}}, \code{\link{publish_dataverse}}, \code{\link{delete_dataverse}}
#' \item Dataset management \dQuote{native} API: \code{\link{create_dataset}}, \code{\link{update_dataset}}, \code{\link{publish_dataset}}, \code{\link{delete_dataset}}, \code{\link{dataset_files}}, \code{\link{dataset_versions}}
Expand Down
90 changes: 29 additions & 61 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")

[![Dataverse Project logo](https://dataverse.org/files/dataverseorg/files/dataverse_project_logo-hp.png)](https://dataverse.org)

The **dataverse** package provides access to [Dataverse](https://dataverse.org/) APIs (versions 4+), enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, retrieval, and deposit, including use of the (currently in development) **sword** package for data deposit and the **UNF** package for data fingerprinting.
The **dataverse** package provides access to [Dataverse](https://dataverse.org/) APIs (versions 4+), enabling data search, retrieval, and deposit, thus allowing R users to integrate public data sharing into the reproducible research workflow. **dataverse** is the next-generation iteration of [the **dvn** package](https://cran.r-project.org/package=dvn), which works with Dataverse 3 ("Dataverse Network") applications. **dataverse** includes numerous improvements for data search, download, and deposit.


### Getting Started

Expand All @@ -25,7 +26,7 @@ You can find a stable release on [CRAN](https://cran.r-project.org/package=datav
install.packages("dataverse")

# Install from GitHub
if (!require("remotes")) install.packages("remotes")
# install.packages("remotes")
remotes::install_github("iqss/dataverse-client-r")
```

Expand All @@ -41,6 +42,9 @@ Some features of the Dataverse API are public and require no authentication. Thi
Sys.setenv("DATAVERSE_KEY" = "examplekey12345")
```

where `examplekey12345` should be replace with your own key.


#### Server

Because [there are many Dataverse installations](https://dataverse.org/), all functions in the R client require specifying what server installation you are interacting with. This can be set by default with an environment variable, `DATAVERSE_SERVER`. This should be the Dataverse server, without the "https" prefix or the "/api" URL path, etc. For example, the Harvard Dataverse can be used by setting:
Expand All @@ -53,7 +57,8 @@ Note: The package attempts to compensate for any malformed values, though.

Currently, the package wraps the data management features of the Dataverse API. Functions for other API features - related to user management and permissions - are not currently exported in the package (but are drafted in the [source code](https://github.com/IQSS/dataverse-client-r)).

### Data and Metadata Retrieval

### Data Download

The dataverse package provides multiple interfaces to obtain data into R. Users can supply a file DOI, a dataset DOI combined with a filename, or a dataverse object. They can read in the file as a raw binary or a dataset read in with the appropriate R function.

Expand Down Expand Up @@ -117,83 +122,41 @@ attr(nlsw_original$race, "labels") # original dta has value labels




#### Reading a dataset as a binary file.

In some cases, you may not want to read in the data in your environment, perhaps because that is not possible (e.g. for a `.docx` file), and you want to simply write these files your local disk. To do this, use the more primitive `get_file_*` commands. The arguments are equivalent, except we no longer need an `.f` argument

```{r get_file_by_name}
nlsw_raw <-
get_file_by_name(
filename = "nlsw88.tab",
dataset = "10.70122/FK2/PPIAXE",
server = "demo.dataverse.org"
)
class(nlsw_raw)
```

#### Reading file metadata

The function `get_file_metadata()` can also be used similarly. This will return a metadata format for ingested tabular files in the `ddi` format. The function `get_dataset()` will retrieve the list of files in a dataset.

```{r, get_dataset}
get_dataset(
dataset = "10.70122/FK2/PPIAXE",
server = "demo.dataverse.org"
)
```

### Data Discovery

Dataverse supplies a robust search API to discover Dataverses, datasets, and files. The simplest searches simply consist of a query string:

```{r search1, eval = FALSE}
dataverse_search("Gary King")
```

More complicated searches might specify metadata fields:

```{r search2, eval = FALSE}
dataverse_search(author = "Gary King", title = "Ecological Inference")
```

And searches can be restricted to specific types of objects (Dataverse, dataset, or file):

```{r search3, eval = FALSE}
dataverse_search(author = "Gary King", type = "dataset")
```

The results are paginated using `per_page` argument. To retrieve subsequent pages, specify `start`.

### Data Archiving

Dataverse provides two - basically unrelated - workflows for managing (adding, documenting, and publishing) datasets. The first is built on [SWORD v2.0](http://swordapp.org/sword-v2/). This means that to create a new dataset listing, you will have to first initialize a dataset entry with some metadata, add one or more files to the dataset, and then publish it. This looks something like the following:

``` r
# retrieve your service document
# After setting appropriate dataverse server and environment, obtain SWORD
# service doc
d <- service_document()

# create a list of metadata
# create a list of metadata for a file
metadat <-
list(
title = "My Study",
title = paste0("My-Study_", format(Sys.time(), '%Y-%m-%d_%H:%M')),
creator = "Doe, John",
description = "An example study"
)

# create the dataset
ds <- initiate_sword_dataset("mydataverse", body = metadat)
# create the dataset, where "mydataverse" is to be replaced by the name
# of the already-created dataverse as shown in the URL
ds <- initiate_sword_dataset("<mydataverse>", body = metadat)

# add files to dataset
tmp <- tempfile()
write.csv(iris, file = tmp)
f <- add_file(ds, file = tmp)
readr::write_csv(iris, file = "iris.csv")

# Search the initiated dataset and give a DOI and version of the dataverse as an identifier
mydoi <- "doi:10.70122/FK2/BMZPJZ&version=DRAFT"

# add dataset
add_dataset_file(file = "iris.csv", dataset = mydoi)

# publish new dataset
publish_sword_dataset(ds)

# dataset will now be published
list_datasets("mydataverse")
list_datasets("<mydataverse>")
```

The second workflow is called the "native" API and is similar but uses slightly different functions:
Expand All @@ -216,6 +179,11 @@ get_dataverse("mydataverse")

Through the native API it is possible to update a dataset by modifying its metadata with `update_dataset()` or file contents using `update_dataset_file()` and then republish a new version using `publish_dataset()`.

### Other Installations
For more extensive features of updating and maintaining data, see [pyDataverse](https://pydataverse.readthedocs.io/en/latest/).


### Related Software

Other dataverse clients include [pyDataverse](https://pydataverse.readthedocs.io/en/latest/) for Python and the [Java client](https://github.com/IQSS/dataverse-client-java).

Users interested in downloading metadata from archives other than Dataverse may be interested in Kurt Hornik's [OAIHarvester](https://cran.r-project.org/package=OAIHarvester) and Scott Chamberlain's [oai](https://cran.r-project.org/package=oai), which offer metadata download from any web repository that is compliant with the [Open Archives Initiative](http://www.openarchives.org/) standards. Additionally, [rdryad](https://cran.r-project.org/package=rdryad) uses OAIHarvester to interface with [Dryad](https://datadryad.org/stash). The [rfigshare](https://cran.r-project.org/package=rfigshare) package works in a similar spirit to **dataverse** with <https://figshare.com/>.
111 changes: 28 additions & 83 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,7 @@ public data sharing into the reproducible research workflow.
**dataverse** is the next-generation iteration of [the **dvn**
package](https://cran.r-project.org/package=dvn), which works with
Dataverse 3 (“Dataverse Network”) applications. **dataverse** includes
numerous improvements for data search, retrieval, and deposit, including
use of the (currently in development) **sword** package for data deposit
and the **UNF** package for data fingerprinting.
numerous improvements for data search, download, and deposit.

### Getting Started

Expand All @@ -34,7 +32,7 @@ latest development version from
install.packages("dataverse")

# Install from GitHub
if (!require("remotes")) install.packages("remotes")
# install.packages("remotes")
remotes::install_github("iqss/dataverse-client-r")
```

Expand All @@ -56,6 +54,8 @@ variable called `DATAVERSE_KEY`. It can be set within R using:
Sys.setenv("DATAVERSE_KEY" = "examplekey12345")
```

where `examplekey12345` should be replace with your own key.

#### Server

Because [there are many Dataverse
Expand All @@ -79,7 +79,7 @@ management and permissions - are not currently exported in the package
(but are drafted in the [source
code](https://github.com/IQSS/dataverse-client-r)).

### Data and Metadata Retrieval
### Data Download

The dataverse package provides multiple interfaces to obtain data into
R. Users can supply a file DOI, a dataset DOI combined with a filename,
Expand Down Expand Up @@ -193,74 +193,6 @@ attr(nlsw_original$race, "labels") # original dta has value labels
## white black other
## 1 2 3

#### Reading a dataset as a binary file.

In some cases, you may not want to read in the data in your environment,
perhaps because that is not possible (e.g. for a `.docx` file), and you
want to simply write these files your local disk. To do this, use the
more primitive `get_file_*` commands. The arguments are equivalent,
except we no longer need an `.f` argument

``` r
nlsw_raw <-
get_file_by_name(
filename = "nlsw88.tab",
dataset = "10.70122/FK2/PPIAXE",
server = "demo.dataverse.org"
)
class(nlsw_raw)
```

## [1] "raw"

#### Reading file metadata

The function `get_file_metadata()` can also be used similarly. This will
return a metadata format for ingested tabular files in the `ddi` format.
The function `get_dataset()` will retrieve the list of files in a
dataset.

``` r
get_dataset(
dataset = "10.70122/FK2/PPIAXE",
server = "demo.dataverse.org"
)
```

## Dataset (182162):
## Version: 1.1, RELEASED
## Release Date: 2020-12-30T00:00:24Z
## License: CC0
## 22 Files:
## label version id contentType
## 1 nlsw88_rds-export.rds 1 1734016 application/octet-stream
## 2 nlsw88.tab 3 1734017 text/tab-separated-values

### Data Discovery

Dataverse supplies a robust search API to discover Dataverses, datasets,
and files. The simplest searches simply consist of a query string:

``` r
dataverse_search("Gary King")
```

More complicated searches might specify metadata fields:

``` r
dataverse_search(author = "Gary King", title = "Ecological Inference")
```

And searches can be restricted to specific types of objects (Dataverse,
dataset, or file):

``` r
dataverse_search(author = "Gary King", type = "dataset")
```

The results are paginated using `per_page` argument. To retrieve
subsequent pages, specify `start`.

### Data Archiving

Dataverse provides two - basically unrelated - workflows for managing
Expand All @@ -271,30 +203,36 @@ with some metadata, add one or more files to the dataset, and then
publish it. This looks something like the following:

``` r
# retrieve your service document
# After setting appropriate dataverse server and environment, obtain SWORD
# service doc
d <- service_document()

# create a list of metadata
# create a list of metadata for a file
metadat <-
list(
title = "My Study",
title = paste0("My-Study_", format(Sys.time(), '%Y-%m-%d_%H:%M')),
creator = "Doe, John",
description = "An example study"
)

# create the dataset
ds <- initiate_sword_dataset("mydataverse", body = metadat)
# create the dataset, where "mydataverse" is to be replaced by the name
# of the already-created dataverse as shown in the URL
ds <- initiate_sword_dataset("<mydataverse>", body = metadat)

# add files to dataset
tmp <- tempfile()
write.csv(iris, file = tmp)
f <- add_file(ds, file = tmp)
readr::write_csv(iris, file = "iris.csv")

# Search the initiated dataset and give a DOI and version of the dataverse as an identifier
mydoi <- "doi:10.70122/FK2/BMZPJZ&version=DRAFT"

# add dataset
add_dataset_file(file = "iris.csv", dataset = mydoi)

# publish new dataset
publish_sword_dataset(ds)

# dataset will now be published
list_datasets("mydataverse")
list_datasets("<mydataverse>")
```

The second workflow is called the “native” API and is similar but uses
Expand All @@ -321,7 +259,14 @@ its metadata with `update_dataset()` or file contents using
`update_dataset_file()` and then republish a new version using
`publish_dataset()`.

### Other Installations
For more extensive features of updating and maintaining data, see
[pyDataverse](https://pydataverse.readthedocs.io/en/latest/).

### Related Software

Other dataverse clients include
[pyDataverse](https://pydataverse.readthedocs.io/en/latest/) for Python
and the [Java client](https://github.com/IQSS/dataverse-client-java).

Users interested in downloading metadata from archives other than
Dataverse may be interested in Kurt Hornik’s
Expand Down
3 changes: 1 addition & 2 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ articles:
contents:
- 'A-introduction'
- 'B-search'
- 'C-retrieval'
- 'D-archiving'
- 'C-download'

reference:
- title: "Retrieve"
Expand Down
2 changes: 1 addition & 1 deletion man/dataverse.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading