-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find publications that cite your data #13
Comments
I might condense the first couple of approaches -- which it seems like you've rejected as impractical anyway -- but, yes, I think this would be useful. |
@twhiteaker we've focused on this issue a lot, and have several complementary approaches to gathering data citations together into a common resource, which we compile at the DataONE citation and metrics service, and send to the DataCite EventData service. The DataONE metrics service can be used to get the information we have compiled and scraped from these varied sources. For example, here's a screenshot of the Arctic Data Center portal showing, for each publication, which data sets they cite (sometimes more than one): We also provide mechanisms for researchers to notify us that they cited a dataset, and we query a number of sources regularly. Althea Marks worked last year for us on evaluating completeness (or lack thereof) of these citation compilation efforts, and I can share her AGU poster on the topic if you have interest. Let me know if you'd like to chat further about this -- I'd love to see cross-community approaches that make this easier for everyone. |
Oh, and if you want to see an LTER-related example of this, check out the Toolik Lake portal: https://search.dataone.org/portals/toolik/Metrics |
@mbjones It seems like leveraging the DataONE metrics service would be the way to go for LTER sites. Can we access it directly or do we need some magic DataONE member mojo? Is version 0.0.2 the latest documentation? Access to the service looks more complicated than just formulating a URL. But if you can enable/teach me to use the service for this LTER use case, then I'm willing to write it up for this IM manual. I think what we'd want for a given LTER site is a table of publications that cite their data. There would be columns for the publication and columns for the cited dataset. The columns could simply by publication DOI and data DOI, but author, title, and year for each would also probably be helpful. I think in many cases you could filter by package identifier with the site acronym, e.g., knb-lter-ble, but that may not always be the case. BLE uses EDI's dataset landing pages to add publications that cite data packages. When we do that, does that information make it into the DataONE metrics? |
Hey @twhiteaker -- I think the most recent version of our docs is here: https://app.swaggerhub.com/apis/nenuji/data-metrics/1.0.0.5, but we haven't fully documented the metrics service as much as our other services, so things could be incomplete. I will cc Rushiraj @nenuji who is the main author of the docs to see if he has any comments. DataONE draws from a number of sources, but we both pull from the DataCite EventData service and push reliable citations back to EventData. So, if EDI is publishing their citations to EventData as well, I think they should show up in the dataone service, probably with a lag. You can see an example metrics query by inspecting the network calls made by the DataONE portal service in your browser. Here's an example request made for the Toolik portal, which is what is used to construct the screenshot I included earlier (and other stuff on that page):
That gets all of the datasets associated with a portal, but you can also use The request is specified in JSON, as outlined in the SWAGGER docs linked above. We're happy to chat about the details over in the DataONE slack (https://slack.dataone.org) in the |
EDI has recently started to report citations to DataCite. So, if DataONE reports there too, it would probably easier to query their API. But what is the goal for the IM manual? Aren't you trying to find more citations. I.e., in addition to the ones that are already linked through efforts by EDI and DataONE? At least that was my impression when reading your BLE document. |
@cgries The goal is to find all data use citations for a given LTER site. For sites archiving solely with DataONE member nodes, utilizing the metrics service is the most practical solution. I don't think IMs have time to go searching for citations themselves. If they do, and if they find a scriptable solution that gets citations not in the metrics service, then they can share their script with DataONE which helps everyone. If a site archives data outside of a DataONE member node, then I guess they're on their own, though the manual could still provide some guidance to at least get started. |
It sounds like DataONE pushes results to the DataCite EventData service. To support cases when a dataset isn't in DataONE, maybe it makes more sense for me to query DataCite. Sound right? |
Yes, you could query EventData directly (and Note that, in theory DataCite only supports citations to DOI-bearing objects, whereas DataONE can store citations to objects with any identifier type. We had a TODO item in Make Data Count to support other identifier types as well, but that has not yet materialized as far ask I know. I do think it is still on DataCite's radar though for the new open citation service they recently announced. |
I made an example Python project which gets citations using DataCite. The demo code produces a CSV file of all data citations for datasets under a given LTER site (via the scope). It wound up being more complicated than I thought. I'm querying EDI's PASTA to get all BLE LTER's datasets DOIs, then DataCite to get the citations which just gives me a DOI, then Crossref to get metadata (e.g., author, title) on the citations. What are the chances that (a) EDI will add a reporting feature where I can get citations for all datasets for a given LTER site (via scope, I presume), or (b) DataONE will add similar functionality, perhaps based on the LTER site's name or some other filter if we can't filter by scope? If chances are slim, then perhaps the code I wrote can be referenced in the new section of the IM manual about this. |
@twhiteaker are you envisioning more than the Journal citation services in the PASTA API? Specifically the List Data Package Citations, which can be run for a scope? |
Does it work for scope? This gave me no results. If it works, that would be very convenient. We'd still have to worry about datasets that aren't in EDI, which would be an advantage of going straight to DataCite. |
@cgries Also this one includes three citations for preprints The screenshot above shows that there are only three entries. Ah, I'm recalling now that maybe you added the other citations, and so that's why I don't see them and can't edit them out. |
|
Ah, I now see the benefit of leaving in preprints. For counting citations for NSF reports, I think I'd still leave the preprints out. For the IM Manual, I'm thinking of suggesting folks use DataCite, and for an example see this Python implementation. |
On the publications page, we could add a section on how to discover publications that cite an LTER site's data. I don't know what the best way to do this is. Here's what BLE has documented, though we don't actively follow these ideas:
https://github.com/BLE-LTER/ble-handbook/blob/main/handbook.md#data-citations
The text was updated successfully, but these errors were encountered: