Support `vars` option in get_file/get_dataframe #79

kuriwaki · 2021-01-29T18:24:18Z

vars should be an argument that subsets the columns of the dataset to pull. However, it seems to not affect anything and just returns the whole dataset.

library(dataverse)

df_tab_all <-
  get_file_by_name(
    filename = "roster-bulls-1996.tab",
    dataset  = "doi:10.70122/FK2/HXJVJU",
    server   = "demo.dataverse.org"
  )

df_tab_vars <-
  get_file_by_name(
    filename = "roster-bulls-1996.tab",
    dataset  = "doi:10.70122/FK2/HXJVJU",
    server   = "demo.dataverse.org",
    vars = c("number", "player") # only two columns
  )

# first data should be larger (more data)
stopifnot(object.size(df_tab_all) > object.size(df_tab_vars))
#> Error: object.size(df_tab_all) > object.size(df_tab_vars) is not TRUE


# does it work on get_dataframe?
df_tab_vars <-
  get_dataframe_by_name(
    filename = "roster-bulls-1996.tab",
    dataset  = "doi:10.70122/FK2/HXJVJU",
    server   = "demo.dataverse.org",
    vars = c("number", "player") # only two columns
  )
#> Downloading ingested version of data with readr::read_tsv. To download the original version and remove this message, set original = TRUE.
#> Rows: 15 Columns: 9
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (6): player, position, height, dob, country_birth, college
#> dbl (3): number, weight, experience_years
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ncol(df_tab_vars)
#> [1] 9

^{Created on 2022-01-12 by the reprex package (v2.0.1)}

EDITED 2021-01-12 with new version of dataverse, which now avoids errors and fixes the reprex.

The text was updated successfully, but these errors were encountered:

kuriwaki · 2024-10-16T19:27:25Z

While this is open, we should just delete the vars argument to avoid confusion

kuriwaki · 2024-10-16T23:54:41Z

I tried to to obtain the first column of https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/PPIAXE by

get_file_by_name(
    filename = "nlsw88.tab",
    dataset  = "10.70122/FK2/PPIAXE",
    vars = 1, original = FALSE,
    server   = "demo.dataverse.org", return_url = TRUE)

which gives https://demo.dataverse.org/api/access/datafile/1734017?vars=1
So the vars argument does seem to be attached properly, following the format example here.

However, using that URL still gives me the whole dataset, instead of only the first column.

kuriwaki added bug data-download Functions that are about downloading, not uploading, data labels Jan 29, 2021

kuriwaki added this to the CRAN 0.4.0 milestone Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `vars` option in get_file/get_dataframe #79

Support `vars` option in get_file/get_dataframe #79

kuriwaki commented Jan 29, 2021 •

edited

Loading

kuriwaki commented Oct 16, 2024

kuriwaki commented Oct 16, 2024

Support vars option in get_file/get_dataframe #79

Support vars option in get_file/get_dataframe #79

Comments

kuriwaki commented Jan 29, 2021 • edited Loading

kuriwaki commented Oct 16, 2024

kuriwaki commented Oct 16, 2024

Support `vars` option in get_file/get_dataframe #79

Support `vars` option in get_file/get_dataframe #79

kuriwaki commented Jan 29, 2021 •

edited

Loading