Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polar Cluster #57

Open
3 of 11 tasks
sunnydean opened this issue Jan 29, 2025 · 7 comments
Open
3 of 11 tasks

Polar Cluster #57

sunnydean opened this issue Jan 29, 2025 · 7 comments

Comments

@sunnydean
Copy link

sunnydean commented Jan 29, 2025

Project Information

Contact Person:

Contact Email:
[Not provided]

Project Website:
ESA Polar Cluster

ESA TO:
[email protected]


Description

Project Short Description:
The ESA Polar Cluster encompasses multiple projects focusing on polar research, leveraging Earth Observation data to study polar processes and climate impacts. Several initiatives are already listed in the Open Science Catalog, such as “4D Antarctica” and others, reflecting diverse scientific goals and methodologies.

Theme:
Polar Region Earth Observation

Project Duration:
[Not provided]


Motivation and Connections

Motivation for choosing EarthCODE:
[Not provided]

Working with NoR:
[Not provided]


Data and Platforms

Platforms:

  • DeepESDL
  • PolarTEP

Critical Datasets:
[Not provided]

Data Sources:
[Not provided]

Recommended EO Metadata Standards:
[Not provided]

Data Access Restrictions:
[Not provided]

Authorization Required:
[Not provided]

Data Format:
[Not provided]


Reusability and Challenges

Reusing Workflows or Data:
[Not provided]


Additional Comments

  • The Polar Cluster is already listed on the Open Science Catalog.

  • Action Items:

  • 1. Ewelina to speak with Artemis in person.

  • 2. Dean to prepare to send an email to Martin (CC Artemis, Anca, etc.).

  • Need to sync schedules with DeepESDL, Chandra, Ewelina - on Wednesday - try and book for 13th of Feb

[Comments]

Key Insights from Interview:

  • Five new projects have recently started and are required to make their data open on the Open Science Catalogue. They would be a great fit for interviewing as we can tailor EarthCODE and future guides towards their needs.
  • They are already asking for guidelines to follow the requirement for sharing data and workflows. EarthCODE should provide examples of standardized publishing to ease onboarding of the new projects. The status of previous projects varies—some have uploaded their data, others have not, and there is inconsistency in data formats.
  • Key Motivators:
  • Improved discoverability of datasets, as in better SEO for Open Science Catalogue, is important for them – they want to be on the first page of a google search like NASA and not on the 2nd or 3rd
  • The Open Science Catalogue should serve as a tracking tool to highlight ESA project contributions. Also to increase the Science –to- Policy communication (which one of the key goals outlined in the ESA 2024 strategy. It could be done with visualization tools as well, as a straightforward mean to share crucial insights with just a link.
  • Could improve the interoperability of their datasets – use old research data for new research – move to more cloud-native formats
  • They are unsure as to where to go next with projects that already have the data in the DeepESDL and how to link them with the Open Science Catalog to publish
  • Most if not all projects in the Polar Cluster run workflows on their institutions local HPC infrastructure, data is even sometimes local (e.g. some have even cloned massive chunks of Sentinel data locally)
  • There are all sorts of formats being used, NetCDF, GeoTIFFs, shp., CSVs etc. - The size of the data is typically between 1-10 GB
  • Some datasets from Cryosphere theme are now also a part of Polar Science Cluster have already converted their datasets into .zarr and published in DeepESDL (also in the Viewer). We should make them visible also in the Open Science Data Catalogue? Older datasets, coming from already completed projects, are mostly non-intra-operable and in non-cloud-native formats. As a cluster they would like to make their data more interoperable (e.g. adopt zarr) and adopting more cloud-native friendly formats to ensure datasets are accessible and usable across projects.

Actions:

  • Follow-up email and Open discourse topic for the Polar Cluster and share with them
  • Identify 5 recently started polar cluster projects (AI4IS, FRESH4BIO, ARCTEX, GRMISS, SO-SIMBA, other?) We are interested in engaging with them in a form of a 45 min interview so that we can better understand their needs with respect to EarthCODE.
  • Prepare a list of completed projects and identify data which you would like to publish to the EarthCODE catalog. Ideally, from these, we can pick one which already has data on DeepESDL and we can use it case-study/template for the new projects for publishing data.
  • Create an issue for the new datasets to be published in the Open Science Data Catalogue: https://github.com/orgs/ESA-EarthCODE/projects/2
  • Create a PR for the products that are ready available in the Open Science Data Catalogue but need some metadata updates (directly through GitHub)
  • If training is needed, DeepESDL team will be present in ESRIN (on-site) from 3-6 May 2025. We can organize hands-on in-person session on DeepESDL on how to build the datacubes and make them available through EarthCODE Catalog.
  • How datasets from Polar Cluster (for the ESSI 2023) converted to data cubes and hosted on DeepESDL with the viewer can be / are linked to Open Science Data Catalogue?
  • Start preparation of the guidelines for the requirements that may be addressed to the projects at different project stages (starting, ongoing, completed) on how to publish the data + code and how to make their research FAIR?
@sunnydean
Copy link
Author

Sent:

​Dear Martin and Artemis,
It's great to meet you. I’m Dean, and I'm working on community aspects of the European Space Agency’s EarthCODE initiative led by Anca Anghelea as a TO of a project. I wanted to follow up on your recent conversation with Anca about possibilities of sharing the data from polar cluster projects through the EarthCODE and the integration of DeepESDL there.

What would be the best time to schedule a call so that we can provide you with more details and learn about your activities and projects?

In the meantime, if you are not entirely familiar with EarthCODE, feel free to have a look at the first release of our portal to explore the potential of the platform once it becomes operational. https://earthcode.esa.int/.

Let me know what you think and what date would work best for you.

@sunnydean
Copy link
Author

From Ewelina
"So I asked Artemis, and she is quite flexible but she liked the idea of having dedicated meeting on this. It would be more about data sharing, but f course I think it would be beneficial to introduce them in general to EarthCODE and what they can get from them
"

@edobrowolska
Copy link

To do beofre the meeting:

  • Identify possibilities and the strategy on how easily bring results from different projects to be shared via EarthCODE and Open Sceince Catalogue

@sunnydean
Copy link
Author

To do beofre the meeting:

  • Identify possibilities and the strategy on how easily bring results from different projects to be shared via EarthCODE and Open Sceince Catalogue

@edobrowolska do you know if they're currently storing or cataloging their data somewhere?

@edobrowolska
Copy link

Some examples are already in the Open Science Catalogue, many projects maintain the datasets in persistent repositories. You can find some examples below:

@edobrowolska
Copy link

Projects identified - to be onboarded:
We need information on these projects at what stage they are, and if they can provide some workflow/use case example, for the successful user stories/ or to be used as early adopters:

  • AI4IS: AI FORECASTING FOR ICE SHELF CALVING
  • FRESH4BIO
  • ARCTEX
  • GRMISS – DEVELOPMENT OF A GNSS-R MODULE FOR ICE AND SNOW IN SMRT (Open Call)
  • SO-SIMBA
  • GLACIERS MASS BALANCE INTERCOMPARISON EXERCISE (GLAMBIE)

@sunnydean
Copy link
Author

Hi Martin, Artemis,

Thank you for your time and invaluable feedback. It was great to learn more about the Polar Cluster and the various projects there.

Following our interview, I wanted to pass on our key takeaways from the discussion and continue with some next steps. We also invite you to sign up to the EarthCODE forum so that we can keep the conversation going: https://discourse-earthcode.eox.at/

Please see the key insights at the bottom of this email and feel free to make some corrections or suggest something we missed or did not discuss.

Some actions we discussed:

Identify 5 recently started polar cluster projects (AI4IS, FRESH4BIO, ARCTEX, GRMISS, SO-SIMBA, other?) We are interested in engaging with them in the form of a 45 min interview so that we can better understand their needs with respect to EarthCODE.
Prepare a list of completed projects and identify data which you would like to publish to the EarthCODE catalog. Ideally, from these, we can pick one which already has data on DeepESDL and we can use it case-study/template for the new projects for publishing data.
Create an issue for the new datasets to be published in the Open Science Data Catalogue: https://github.com/orgs/ESA-EarthCODE/projects/2
Create a PR for the products that are already available in the Open Science Data Catalogue but need some metadata updates (directly through GitHub)
If training is needed, DeepESDL team will be present in ESRIN (on-site) from 3-4 May 2025. We can organize hands-on in-person sessions on DeepESDL on how to build the data cubes and make them available through EarthCODE Catalog.
How datasets from Polar Projects that have been made available through xcube viewer and DeepESDL are linked to Open Science Data Catalogue
Start preparation of the guidelines for the requirements that may be addressed to the projects at different project stages (starting, ongoing, completed) on how to publish the data + code and how to make their research FAIR?

The slides from the meeting are available attached to this email: EarthCODE_PolarCluster.pptx

Best Regards,

Key Insights:

Five new projects have recently started and are required to make their data open on the Open Science Catalogue. They would be a great fit for interviewing as we can tailor EarthCODE and future guides towards their needs.

Guidelines to follow the requirement for sharing data and workflows are needed. EarthCODE should provide examples of standardized publishing to ease onboarding of the new projects. The status of previous projects varies—some have uploaded their data, others have not, and there is inconsistency in data formats.

Key Motivators for using EarthCODE:

Improved discoverability of datasets, as in better SEO for Open Science Catalogue, is important for them – they want to be on the first page of a google search like NASA and not on the 2nd or 3rd

The Open Science Catalogue should serve as a tracking tool to highlight ESA project contributions. Also to increase the Science –to- Policy communication (which one of the key goals outlined in the ESA 2024 strategy. It could be done with visualization tools as well, as a straightforward mean to share crucial insights with just a link.

Could improve the interoperability of their datasets – use old research data for new research – move to more cloud-native formats

There is need of guidance to identify where to go next with projects that already have the data in the DeepESDL and how to link them with the Open Science Catalogue to publish the datasets.

Most if not all projects in the Polar Cluster run workflows on their institutions local HPC infrastructure, data is even sometimes local (e.g. some have even cloned massive chunks of Sentinel data locally)

There are all sorts of formats being used, NetCDF, GeoTIFFs, shp., CSVs etc. - The size of the data is typically between 1-10 GB

Some datasets from Cryosphere theme are now also a part of Polar Science Cluster have already converted their datasets into .zarr and published in DeepESDL (also in the xcube viewer). We should make them visible also in the Open Science Data Catalogue. Older datasets, coming from already completed projects, are mostly non-intra-operable and in non-cloud-native formats. There is a strong need for making their data more interoperable (e.g. adopt zarr) and adopting more cloud-native friendly formats to ensure datasets are accessible and usable across projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants