Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make mbon_stats response easier to work with #79

Open
MathewBiddle opened this issue May 29, 2024 · 6 comments
Open

Make mbon_stats response easier to work with #79

MathewBiddle opened this issue May 29, 2024 · 6 comments

Comments

@MathewBiddle
Copy link
Contributor

@ocefpaf, @laurabrenskelle and I added an mbon_stats function, during the code sprint, to go out and grab statistics about dataset usage from two APIs. The response is a big data frame with some nested dictionaries. I think we're collecting all the data we need, but now it's a matter of being able to parse and use the response.

I'm curious if you could take a look at the function and see what we can do to make it easier to work with the data. As it stands we have to do row wise iteration to split out the GBIF and obis download information and that seems overburdening.

A generic question I have is how much do you have the function do vs how much do you do data wrangling in the use case notebook?

Any advice is appreciated!

@ocefpaf
Copy link
Member

ocefpaf commented May 30, 2024

As it stands we have to do row wise iteration to split out the GBIF and obis download information and that seems overburdening.

Do you mean after the table is created? Or to create the table? If the former I believe the GBIF and OBIS are in different columns and no looping over rows is necessary. If I'm mistaken, maybe we need to create two tables instead of one in the function.

A generic question I have is how much do you have the function do vs how much do you do data wrangling in the use case notebook?

That is a good question! Hard to answer without knowing what people will be doing with that table. I like to keep the functions doing a bare minimum and leave the data wrangling part to the end user, b/c that will change more often. In order to implement that we need to identify this minimum table that would empower users to get what they want more easily. (That may lead to the create of multiple functions, easier to maintain, like fetch_obis, fetch_gbif, merge_tables, summary_table, etc).

@MathewBiddle
Copy link
Contributor Author

We also have institution identifiers for some of the RAs. It might be nice to do something additional for these, or to be able to query for them as well.

@MathewBiddle
Copy link
Contributor Author

NERACOOS and SECOORA have OceanExpert IDs but no OBIS institute pages. Hopefully @sformel-usgs can help sort that out.

@sformel-usgs
Copy link

Answer from Pieter:

for the institution landing page to work, dataset contacts need to matched to the respective OceanExpert institutions (unless the exact same contact has been matched before). Unfortunately the tool to do that is not functioning at the moment and needs some work. I'm afraid all I can do right now is manually link contacts to institutions if someone provides me a list of datasets.

@MathewBiddle
Copy link
Contributor Author

@sformel-usgs thanks for digging into this! Is the tool something open source that we can help fix? I don't know what I'm looking for when I browse to that link and log in with OceanExpert.

@sformel-usgs
Copy link

I'm not sure either. I use that tool to update the US node info, but I'm not sure what the back end looks like. I'm guessing from my own permission that we would have to get additional permissions to manage specific organizations and datasets. I can bring this up when I'm in Belgium since I'll be in the building with the OBIS and OceanExpert people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants