-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address RDA Working Group on Dynamic Data Citation (WGDC) recommendations #266
Comments
These are really good ideas! Landing pages are good, but we should ask that as a feature from the pxweb people because it is not part of the API. We want to avoid handling individual API information. Long term, I think we should probably remove the API catalogued and just refer to pxweb list of available APIs. |
Do you mean with "pxweb list of available APIs" this list: https://www.scb.se/en/services/statistical-programs-for-px-files/px-web/pxweb-examples/ ? As I mentioned in #254 there are some broken APIs listed there (Taiwan, Örebro kommun) and there are several APIs that were not listed there. Therefore SCB's list does not seem to be the definitive list available. I compared the same .px and .json files downloaded from stat.fi example page and noticed that actually .px files have more metadata included than .json files. An example of this is the statistics homepage:
and a note that may or may not be of interest to the data user:
which is essentially the same information that is displayed on the PxWeb database web interface. .px-file format seems to be relatively simple and probably easy to implement, especially if it is only used to extract certain type of metadata that is not included in .json files. While this has traditionally been left out of the scope of this package, I think adding the possibility of downloading more metadata in the format of .px files would be useful. Additionally, there are some reports of JSON-stat / JSON-stat 2 output being erroneous compared to .px output (statisticssweden/PxWeb#387). JSON-stat format allows for extension property that can be anything and interestingly enough at least stat.fi json file has several extension properties. It could also be used for storing statistics documentation (landing page) and possible notes related to statistics dataset. EDIT: Actually it seems that PxApi 2.0 is coming out (at least to beta testing) in Autumn 2023 so maybe some of these changes will be implemented then: https://www.scb.se/en/services/open-data-api/pxapi-2.0/ |
Executive summary:
Background information:
Research Data Alliance Data Citation WG has listed 14 recommendations on data reproducibly subsetting datasets and how to cite, share and re-use these subsets:
While data retrieved from PxWeb APIs is maybe not as dynamic as other kinds of data but still occasionally changing (see stat.fi news page, there are some nice recommendations that could be at least acknowledged and, if possible, also implemented.
Here is a list of the recommendations:
Recommendations are grouped as follows: R1-3 "Preparing the Data and the Query Store", R4-10 "Persistently Identifying Specific Data Sets", R11-12 "Resolving PIDs and Retrieving the Data" and R13-14 "Upon modifications to the Data Infrastructure".
Especially interesting, in my opinion, would be to integrate the calculation of query and downloaded dataset hashes (R4, R6) and storing them somewhere alongside other citation data.
Additionally, R12 could be somewhat achieved by changing the URL in the following citation
to simply https://stat.fi/en/statistics/ava which is closest equivalent to a landing page. I'm not sure if this URL is accessible from the API but it's listed at least in a separate csv file: https://statfin.stat.fi/database/StatFin/StatFin_rap.csv
R4 and R5 are kind of done if you use
pxweb_interactive()
as the order which items are printed in is very deterministic. If the order of query printout or dataset items is changed in any way md5 hashes change as well.The different recommendations are, I think, most useful for Pxweb database maintainers and Pxweb developers in SCB, but we could do our own part to think about solutions to the proposed recommendations.
The text was updated successfully, but these errors were encountered: