Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support vars option in get_file/get_dataframe #79

Open
kuriwaki opened this issue Jan 29, 2021 · 2 comments
Open

Support vars option in get_file/get_dataframe #79

kuriwaki opened this issue Jan 29, 2021 · 2 comments
Labels
bug data-download Functions that are about downloading, not uploading, data
Milestone

Comments

@kuriwaki
Copy link
Member

kuriwaki commented Jan 29, 2021

vars should be an argument that subsets the columns of the dataset to pull. However, it seems to not affect anything and just returns the whole dataset.

library(dataverse)

df_tab_all <-
  get_file_by_name(
    filename = "roster-bulls-1996.tab",
    dataset  = "doi:10.70122/FK2/HXJVJU",
    server   = "demo.dataverse.org"
  )

df_tab_vars <-
  get_file_by_name(
    filename = "roster-bulls-1996.tab",
    dataset  = "doi:10.70122/FK2/HXJVJU",
    server   = "demo.dataverse.org",
    vars = c("number", "player") # only two columns
  )

# first data should be larger (more data)
stopifnot(object.size(df_tab_all) > object.size(df_tab_vars))
#> Error: object.size(df_tab_all) > object.size(df_tab_vars) is not TRUE


# does it work on get_dataframe?
df_tab_vars <-
  get_dataframe_by_name(
    filename = "roster-bulls-1996.tab",
    dataset  = "doi:10.70122/FK2/HXJVJU",
    server   = "demo.dataverse.org",
    vars = c("number", "player") # only two columns
  )
#> Downloading ingested version of data with readr::read_tsv. To download the original version and remove this message, set original = TRUE.
#> Rows: 15 Columns: 9
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (6): player, position, height, dob, country_birth, college
#> dbl (3): number, weight, experience_years
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ncol(df_tab_vars)
#> [1] 9

Created on 2022-01-12 by the reprex package (v2.0.1)

EDITED 2021-01-12 with new version of dataverse, which now avoids errors and fixes the reprex.

@kuriwaki kuriwaki added bug data-download Functions that are about downloading, not uploading, data labels Jan 29, 2021
@kuriwaki kuriwaki added this to the CRAN 0.4.0 milestone Jan 13, 2022
@kuriwaki
Copy link
Member Author

While this is open, we should just delete the vars argument to avoid confusion

@kuriwaki
Copy link
Member Author

I tried to to obtain the first column of https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/PPIAXE by

get_file_by_name(
    filename = "nlsw88.tab",
    dataset  = "10.70122/FK2/PPIAXE",
    vars = 1, original = FALSE,
    server   = "demo.dataverse.org", return_url = TRUE)

which gives https://demo.dataverse.org/api/access/datafile/1734017?vars=1
So the vars argument does seem to be attached properly, following the format example here.

However, using that URL still gives me the whole dataset, instead of only the first column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug data-download Functions that are about downloading, not uploading, data
Projects
None yet
Development

No branches or pull requests

1 participant