-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding issue using create_dataset() #55
Comments
Thanks for your issue! Can you please try this again with the latest commit from the develop branch? (see here). Here the develop branch and the docs for it. The next release with a lot of changes will be release until the end of the year, and hopefully the problem is solved by the changes. If not, I will try to reproduce and fix it. And please test out, when the encoding conversions/problems start to appear. Is the imported json string already wrong, or does it happen after the |
Beware this might be related to IQSS/dataverse#6675 and some other issues (older & newer) I linked to from there. I dug a bit and the plot thickens around encoding issues in Jersey. Yet it would be good to have verification that it's not the lib that's causing problems. |
I cloned this repository and tried rerunning the script by importing The high number of related open issues over at IQSS/dataverse suggests it has to do with some aspects of Dataverse itself but when I create a dataset through the user interface I can enter the characters that get lost via the API upload. |
I'm now creating datasets directly with curl and the instructions from here and have not come across any encoding issues. |
Please send me the dataset json file (contact information here: stefankasberger.at). |
@sindribaldur Have tested your script with the latest develop version and a local Dataverse Docker instance (4.18.1). Got the same problem as you. Dataset is created, but with many special characters missing (as you can see from the screenshot). The uploaded string is formatted the same as the string inside your script. When i request the Dataset then with pyDataverse ( @pdurbin Do you know of this problem? It seems to be on the Dataverse side. And @sindribaldur: I have found one error in your JSON string in your |
@skasberger hi, plenty of encoding problems have surfaced over the years but I'm not aware of this being a problem on the Dataverse side. It sounds like @sindribaldur got it working with curl. @sindribaldur would you be interested in trying it from https://github.com/IQSS/dataverse-client-javascript or https://github.com/IQSS/dataverse-client-r ? Those are the other two libraries that are quite active. |
@skasberger Thanks for getting back - I hope the example helps. @pdurbin Thank you, I got everything that I needed working directly with curl and guess I will continue using it like that if needed in the future. I had tried one of these packages before and also hit a wall. |
As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python |
Using a Python script with create_dataset() I created a a new dataset on
demo.dataverse.org
(and one more dataverse server).Where
dsmd
is the content ofdataset-finch1.json
as a string (and slightly modified version of it for my last test) linked in the documentation.Everything seems to work fine but non ascii characters are not displayed (replace with
�
) when I open the dataverse through the browser nor when download it back withget_dataset()
.I'm on Windows 10 with Python 3.6.4 and pyDataverse 0.2.1. I tried to run it as a script from the command line and in Spyder with the same result.
The text was updated successfully, but these errors were encountered: