-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create_dataset and add_file error out #82
Comments
I am able to get it working by converting the curl command (https://guides.dataverse.org/en/5.3/api/native-api.html#create-dataset-command) to R code using this online tool to convert curl to R https://curl.trillworks.com/#r Curl as displayed in https://guides.dataverse.org/en/5.3/api/native-api.html#create-dataset-command at the "Add a File to a Dataset" section:
becomes R:
The R code now works using httr library. Why would it not work with the dataverse package? EDIT::
The add_file did not work nor did add_dataset_file. However changing it do include the full doi does work:
but this only works for add_dataset_file, not for add_file. The create_dataset() function still produces the same error. So for now I can create a dataset with the 'inititate_sword_dataset()' function and add files to it with add_dataset_file() as long as I give the full doi of that dataset (referencing it with just the dataset name created, doesn't seem to work). |
As per request.
What doesn't seem to work is creating a dataset using create_dataset instead of initiate_sword_dataset() before or after retrieving the service document:
Which produces:
And also the add_file produces an error when used instead of add_dataset_file():
Producing the error:
I resolved the previous error in the original post by first searching for the complete draft doi of the created dataset package, but it is still producing an 400 error. Note in the second post that the curl commands work as intended. |
Thanks. Managed to reproduced this on my end. |
@kuriwaki Need a bit of further help. I tried to get the native API curl example mentioned in https://guides.dataverse.org/en/5.3/api/native-api.html at the topic "Create a Dataset in a Dataverse" working in R. They mention that you "must" supply a json file containing the metadata fields to create a dataset in a dataverse. An example of a json file required is provided by them at https://guides.dataverse.org/en/5.3/_downloads/dataset-create-new-all-default-fields.json. The curl code is:
Using https://curl.trillworks.com/#r and some fiddling about (not really understanding what I'm doing), I eventually got to a working one in R:
My question is 2-fold:
|
Unfortunately, I'm not the best person to ask httr/curl questions (don't know enough). But I can help point to others. Regarding 1, pyDataverse may have thought about this issue more. I asked at gdcc/pyDataverse#116 Regarding 2, well, library(dataverse)
server <- "demo.dataverse.nl"
dataverse <- "ddktestdeleteme"
identical(paste0(dataverse:::api_url(server), "dataverses/", dataverse, "/datasets/"), # taken from create_dataset()
"https://demo.dataverse.nl/api/dataverses/ddktestdeleteme/datasets") # given in working code
#> [1] FALSE
# but
identical(paste0(dataverse:::api_url(server), "dataverses/", dataverse, "/datasets"), # remove last slash
"https://demo.dataverse.nl/api/dataverses/ddktestdeleteme/datasets")
#> [1] TRUE Created on 2021-03-03 by the reprex package (v0.3.0) So in create_dataset() there's only that (a) trailing slash and (b) Having no easy reprex is inherent to these upload examples, so thanks for your patience.. |
@kuriwaki No worries, I appreciate all the help I'm getting! I'll start with the positive. The create_dataset() function of the dataverse package actually works as long as you use upload_file() of the httr package along with the json file. Don't know why I missed this. So this works:
I tried changing the functions to add or remove the slash and encode = "json" both in the httr code and in a new function resembling the create_dataset() function. For simplicity's sake, I'll just show the create_dataset(), but the exact same messages occur with adjusting the httr code.
The trailing backslash doesn't seem to matter. Adding or removing it does not seem to change the outcome.
Thus, it all works when using httr::upload_file(somejson.json) in either the raw httr code or in the dataverse funcion create_dataset(). The pydataverse documentation also seems to list that a json must be specified. Looking at https://pydataverse.readthedocs.io/en/latest/ at the create dataset section they specify a json as ds.json(). So I'm assuming for the create_dataset() to work as intended, there needs to be a way to give the "body" option a content in the form of a json object or file. My current way of listing the 'metadat' just doesn't seem to be recognized because of the format and / or the items specified and I don't know how to specify a json file. The examples the dataverse help documentation give do show what a json file looks like https://guides.dataverse.org/en/5.3/_downloads/dataset-finch1.json but I don't know how to do this efficiently in R. Adding just an empty json as per example https://stackoverflow.com/a/20109959 and write it to local disk as per example https://rfaqs.com/reading-and-writing-json-files-in-r/, and then use ::EDIT:: build_metadata: https://rdrr.io/cran/dataverse/src/R/build_metadata.R |
Well...it's dumb as hell and ain't pretty, but it works. Using the minimal fields required for the json:
Note that you can actually create a dataset with an empty json file as mentioned at IQSS/dataverse#6752, which I managed to do, but then the dataset cannot be opened or viewed within dataverse. |
Thank you @Danny-dK for the investigation! I am still learning the create* parts of the package, but it seems like you're post highlights 2 concrete and relatively straightfoward things:
Would you be comfortable or interested in starting a fork/branch to make and test these fixes? In the mid/long-term, I am trying to figure out if we really need to maintain a native vs. sword API, two parallel things, to do the same thing. |
@kuriwaki Well I don't think I'm R versed enough to try these out. Someone with more R experience should probably look into it. For now I can work around it and by using my very very ugly paste code anyone can create the required metadata and doesn't require httr::upload_file(). The function jsonmetaverse in my post just pastes the different json metadata parts behind each other while keeping some keywords to be filled in by the user and is accepted as is by create_dataset(). It's the very minimal required info for creating a dataset. If others want more or other metadata present, just look at all the dataverse metadata so far https://guides.dataverse.org/en/latest/_downloads/dataset-create-new-all-default-fields.json and add them to the very very ugly paste code. One could of course just adjust the json file and then use that file as the body within the create dataset. You do raise an interesting point about whether to keep both native and sword api functions. If there really isn't any advantage of one over the other, I would think to just keep using the sword api to create the dataset as it seems it already has helper functions to create the metadata. Or perhaps more simple would be to explain in the help file what the contents of the body should be in create_dataset() (e.g. The body should consist of allowed dataverse metadata in json format, either as a json file (see https://guides.dataverse.org/en/latest/_downloads/dataset-create-new-all-default-fields.json) or as an R object that adheres to the json structure of the aforementioned link. Currenlty, no helper functions exist to create metadata in a json format (see initiate_sword_dataset() for another way in creating a dataset that has metadata helper functions built-in to create the associated xml metadata used in the sword api.)) For now I'll close this issue, as the cause of the issue is identified and 2 workarounds are available (1 create dataset using sword api, 2 create dataset using native api but supplying the body into a json file or json structured R object). If you want to keep it open, re-open it at will. |
I'll reopen since it still identifies a bug in master and it seems fixable. I might not be able to fix it this month, but I'll see if I can make a branch - and PRs by others are welcome. One question about what you wrote: why can't you use or tweak |
Several reasons of which the main one is that I don't know how. Other reasons are that the build_metadata only seems to accept dc terms and are in an xml format. If one could find out how to refer to the json metdata format https://guides.dataverse.org/en/latest/_downloads/dataset-create-new-all-default-fields.json and extract only the relevant blocks that occur in a list of keywords provided by a user, then I guess one could do it. The build_metadata function called in initiate_sword_dataset() https://rdrr.io/cran/dataverse/src/R/build_metadata.R:
|
Yes, the JSON can be created and validated with pyDataverse, but only for the default metadatablock (not for customized ones, but they are rarely in use as far as i know). The workflow:
You can find this in more detail in our user guides: |
So that still means a csv template is required. There is no helper function like in the initiate_sword_dataset() function https://rdrr.io/cran/dataverse/src/R/build_metadata.R in which it suffices to just specify approved keywords corresponding to metadata terms and the functions builds the required metadata object for you:
That's a bit of a shame. ::EDIT:: Maybe this might give people more ideas on how to possibly go forward. This is as far as I could get. The horrible paste code posted before could also be accomplished with:
It uses dataset-finch2 - Copy.json.txt (just remove the .txt), which is the minimal metadata required imo. Hope it helps in getting ideas (I can imagine it to be a hell of a job doing this for all allowed dataverse json metadata entries). |
Last post from my side. Another option could be to create separate json files of all the entries in https://guides.dataverse.org/en/latest/_downloads/dataset-create-new-all-default-fields.json and call on them if the user specifies them in a list.
|
@kuriwaki No. the |
I'm new to the dataverse package. I can't seem to get things working and I don't understand any of the errors.
The following functions are working:
But adding datafiles to a newly created dataset fails (i'm following the examples given here https://www.rdocumentation.org/packages/dataverse/versions/0.3.0):
The add_file() fails with the error:
The following also does not work
The other example at the previously mentioned link using the native API throws an error at the very first line and thus can't even create the dataset:
Can anyone tell me why it is not working to add files? I'm not particularly adept at error handling, so if you need more info please instruct me on how to receive that info.
R version 4.03
Rstudio 1.4.1103
Windows 1909 build 18363.1256
demo.dataverse.nl v. 4.18.1
dataverse package v 0.3.0
The text was updated successfully, but these errors were encountered: