Create and Manage Plan to Restore Lost EarthCODE Data #55

GarinSmith · 2025-01-07T19:30:48Z

EarthCODE Data Restore Plan

Data (S3 Object Store) - Ewelina to Lead
Assets and Catalogs

What is backed up?
This is stored in Local drive, VM, External sources. We are hopeful we have most of the lost data.
What is the priority?
Probably references from any external sources
What is not backed up?
We will confirm this as part of 1)

MetaData - Garin to Lead
GitHub/EarthCODE Catalogue

Confirm no metadata is lost and we can re-use this? Yes - Done (but we need to change the path)
This is currently no reason to suspect this is an issue.
This does assume that we use the same data location at CloudFerro? Yes - Done (but we need to change the path)
We have asked CloudFerro if they can provide the same S3 instance location. They cannot do this
EOX have indicated that we can do global replace on GitHub to change
https://s3.waw2-1.cloudferro.com/swift/v1/AUTH_3f7e5dd853f54cebb046a29a69f1bba6/OSCAssets
to
https://s3.waw4-1.cloudferro.com/swift/v1/EarthCODE/OSCAssets
and
https://s3.waw2-1.cloudferro.com/swift/v1/AUTH_3f7e5dd853f54cebb046a29a69f1bba6/Catalogs
to
https://s3.waw4-1.cloudferro.com/swift/v1/EarthCODE/Catalogs

E.g. for https://s3.waw2-1.cloudferro.com/swift/v1/AUTH_3f7e5dd853f54cebb046a29a69f1bba6/OSCAssets/seasfire/seasfire-cube/SeasFireCube_v3.zarr

Scripts/Process - Ewelina to Lead

Do we have scripts to move data to S3 or is this done manually?
Yes, we have scripts with some manual effort
Confirm that we can just move data again without changing the existing metadata?
Yes we think so if CloudFerro can help above.
Assume that PPR script will be used later?
Yes, we have suggested some PPR scenarios to support this.

Environment - Garin to Lead

Can we use CloudFerro? - Yes (Done)
Assumes yes subject to clarification of operational procedures.
Can we use PRR? - Yes as an environment (Done)
Yes in parallel to S3. We have more info from Salvatore.
One possible bonus is that we can deploy the above products to PRR when the new script is ready.
CloudFerro alternate S3. Not required (Done)
Not currently planned unless there are problems with 1)

Operational Stability - Garin to Lead

Review/Confirm CloudFerro Operational Procedures - Done (we know that there is no backup service)
See Meta Data point 2)
ESA PRR
When is PRR prototype app package available?
When is PRR production environment available?
What is the PRR SLA?

GarinSmith · 2025-01-07T19:37:10Z

Hi @edobrowolska,
I assigned this task to both of us, because it seemed easier and more flexible.
I used the core plan we worked on together earlier and I suggested the tasks that we each lead on.
I will arrange a catch-up on Thursday.

edobrowolska · 2025-01-09T11:04:17Z

I created a simple excel file with the datasets to be resotred (attached here). Column F indicates the priority of the data to be restored. In two cases we are missing backup for the data itself - I will contact data providers to update us on the access to that assets. In the next step the catalog.json collection files will need to be restored-re-created. This will be next step on me to be checked. Missing-data-list.xlsx

GarinSmith · 2025-01-16T21:15:28Z

Hi @edobrowolska,
I have used the new ESA PRR API to add an asset to the PRR and register that asset in the PRR, so that it can be discovered in the PRR catalogue. To use the PRR we need:

i) A unique ID for a product asset like
https://s3.waw2-1.cloudferro.com/swift/v1/AUTH_3f7e5dd853f54cebb046a29a69f1bba6/OSCAssets/seasfire/seasfire-cube/SeasFireCube_v3.zarr ?
or
https://s3.waw2-1.cloudferro.com/swift/v1/AUTH_3f7e5dd853f54cebb046a29a69f1bba6/OSCAssets/Hydrocoastal/L2E/cs2_full/amur/HCA_L2E_CS_OFFL_SIR1SAR_FR_20191231T122515_20191231T122518_D001.nc ?

ii) A unique ID for a product collection like
https://s3.waw2-1.cloudferro.com/swift/v1/AUTH_3f7e5dd853f54cebb046a29a69f1bba6/Catalogs/seasfire/seasfire-cube/catalog.json

iii) To understand how we will use the references in say the catalog.json above.
We can discuss i) and ii) later, but I am wondering if you have an example catalog.json please so that I can look at the structure it uses and the way it references assets?
I am wondering if references are hardcoded or relative so that it might be re-used without change by the PRR.

edobrowolska · 2025-01-17T14:20:44Z

Hi @GarinSmith,
Thanks for taking action on that. Regarding the i) point I am not sure if I follow it. Those IDs, this is the ref link that points to the assets itself, and in this case those are two separated and different products. So both are correct. Just the structure of one is different than the other but this should be maintained this way.
ii) this ID (rather path to the product) is pointing to the catalog.json which describes this asset. So yes in this case it stays like this.
iii) this catalog.json specifically for the SeasFire cube has been lost, since I have not keep the copy of this, but it can be reproduced from the .zarr data by just re-creating this file according to the documentation using stactools package: stac datacube create-item s3://OSCAssets/seasfire/seasfire-cube/SeasFireCube_v3.zarr/ item.json '--use-driver ZARR
The source of that file is here: https://zenodo.org/records/8055879
You can find instructions here: https://github.com/ESA-EarthCODE/open-science-catalog-metadata/wiki/User-Guide%E2%80%90v.1.0.0

edobrowolska · 2025-01-17T15:01:00Z

Also regarding the example of the catalog, for instance the simple item.json can be found from the Hydrology dataset colelction as attached

HCA_L2E_S3A_SR_1_SRA_A__20160417T091130_20160417T100159_20180203T044515_3029_003_121______LR1_R_NT_003.json

edobrowolska · 2025-01-17T15:02:55Z

The catalog.json can also be recreated by just using the tool stac add item. Catalog.json we''ve been using looks like this one:
{
"type": "Catalog",
"id": "examples",
"title": "Example catalog",
"stac_version": "1.0.0",
"description": "This catalog is a simple demonstration of an example catalog that is used to organize STAC Items",
"links": [
{
"rel": "self",
"href": "https://raw.githubusercontent.com/radiantearth/stac-spec/v1.1.0/examples/catalog.json",
"type": "application/json"
},
{
"rel": "root",
"href": "./catalog.json",
"type": "application/json",
"title": "Example catalog"
}
]
}

**Add STAC Items to a common catalog.json _ by applying _ 'stac add' _ command
for item_file in item_files/item*.json; do stac add "$item_file" catalog.json; done;

edobrowolska · 2025-01-20T09:07:13Z

Hi @GarinSmith I have also another example of the catalog.json from the dataset to be restored (this used reference to extrernally stored assets. Find this attached.

catalog-example.json.txt

GarinSmith · 2025-01-21T18:55:34Z

Cloud Ferro have confirmed we should have access to create S3 Object Storage.
I need to create this S3 Object Storage (not done yet)
Cloud Ferro have stated we need to use (see s3.waw4-1)
E.g. https://s3.waw4-1.cloudferro.com/swift/v1/AUTH_3f7e5dd853f54cebb046a29a69f1bba6/OSCAssets/seasfire/seasfire-cube/SeasFireCube_v3.zarr
instead of
https://s3.waw2-1.cloudferro.com/swift/v1/AUTH_3f7e5dd853f54cebb046a29a69f1bba6/OSCAssets/seasfire/seasfire-cube/SeasFireCube_v3.zarr
They cannot forward requests from https://s3.waw4-1 to https://s3.waw2-1

Hi @edobrowolska and @Schpidi
We have a lot of GitHub references using s3.waw2-1 - do you know if there is a way to update them in bulk using find and replace somehow and change them all to s3.waw4-1 (once we have moved assets to Cloudferro)?
I have seen a few examples talking about this, but maybe we have done this before?

GarinSmith · 2025-01-21T19:26:29Z

Note that I have used the ESA PRR API to deploy a test asset to the EarthCODE Test Collection - https://eoresults.esa.int/stac/collections/EARTHCODE_TEST/items
I had to tweak the PRR instructions to get it to work, but that may have been because of my environment.

edobrowolska · 2025-01-22T08:10:56Z

Hi @GarinSmith Thanks a lot for your work! I would suggest to start uploading the Assets and creating stac catalog first, then updating the 25 items with the reference link is not a lot, even if we will have to do it manually. This link is only referenced in the osc-metadata/products catalog, so it should not be a problem. I'm not aware about the automated way for doing that, but maybe @Schpidi has some solution. As I said, first action would be to upload the datasets to that new s3 bucket, the reference link is just a final step..

GarinSmith · 2025-01-22T15:23:01Z

Hi @edobrowolska, I agree with the approach that you suggest. Thanks.

GarinSmith · 2025-01-23T18:41:07Z

I have created a bucket and the folders in the new CloudFerro S3 Object Store. These are
OSCAssets
Catalogs
as before

I can access a test file using
https://s3.waw4-1.cloudferro.com/swift/v1/EarthCODE/OSCAssets/garintest.tgz

GarinSmith · 2025-01-23T18:42:57Z

I have made progress on the PRR with regard to
PRR Integration Plan
PRR technical progress update – Garin/All
PRR/Cloud Ferro Publish Epic - All
PRR/Cloud Ferro Access Epic - All
PRR ESA Operational Implementation Strategy – ESA
PRR Technical Integration to EarthCODE - Garin/All
Please see https://esait.sharepoint.com/:p:/r/sites/EarthCODE/Shared%20Documents/General/Communication/Scrum%20of%20Scrums/EarthCODE%20PRR%20Notes.pptx?d=wd832840b01704d34a0f0a05ce38964d1&csf=1&web=1&e=FTzedI

edobrowolska · 2025-01-24T08:16:46Z

I have created a bucket and the folders in the new CloudFerro S3 Object Store. These are OSCAssets Catalogs as before

I can access a test file using https://s3.waw4-1.cloudferro.com/swift/v1/EarthCODE/OSCAssets/garintest.tgz

Hi Garin,
do you think it would make sense to have quick catch up today to discuss this? It looks good to me, I think we can go ahead and move the products there asap. Let me know

GarinSmith · 2025-01-24T10:16:18Z

I @edobrowolska ,
Glad it looks OK.
Sounds good. Are you free after the PRR meeting (1600 CET )?
We could just carry on after if OK.

edobrowolska · 2025-01-24T12:35:37Z

Hi Garin, unfortunately I need to stop working today at 4 pm. On Friday's we work only until 16:00. Then let's catch up on Monday morning (before our 11 am meeting? )

GarinSmith · 2025-01-29T15:10:12Z

I am now able to access Cloudferro remotely using S3.
The following commands now work
s3cmd ls
s3cmd ls s3://EarthCODE
s3cmd ls s3://EarthCODE/OSCAssets/
s3cmd put garintest2.tgz s3://EarthCODE/OSCAssets/garintest2.tgz
I have sent the access key and secret key to Stephan and Ewelina.

GarinSmith · 2025-02-04T22:56:37Z

This now works end to end from a technical perspective. E.g.
https://opensciencedata.esa.int/stac-browser/#/external/https://s3.waw4-1.cloudferro.com/EarthCODE/Catalogs/extrAIM/catalog.json

edobrowolska · 2025-02-05T08:20:55Z

Data are available now in s3 bucket (OSCAssets) - only few are missing, as the upload was not successfull. We need to put back some remaining catalog.json for these datasets (in progress)

GarinSmith assigned GarinSmith and edobrowolska Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create and Manage Plan to Restore Lost EarthCODE Data #55

Create and Manage Plan to Restore Lost EarthCODE Data #55

GarinSmith commented Jan 7, 2025 •

edited

Loading

GarinSmith commented Jan 7, 2025

edobrowolska commented Jan 9, 2025

GarinSmith commented Jan 16, 2025

edobrowolska commented Jan 17, 2025 •

edited

Loading

edobrowolska commented Jan 17, 2025

edobrowolska commented Jan 17, 2025

edobrowolska commented Jan 20, 2025

GarinSmith commented Jan 21, 2025

GarinSmith commented Jan 21, 2025 •

edited

Loading

edobrowolska commented Jan 22, 2025

GarinSmith commented Jan 22, 2025

GarinSmith commented Jan 23, 2025

GarinSmith commented Jan 23, 2025

edobrowolska commented Jan 24, 2025

GarinSmith commented Jan 24, 2025

edobrowolska commented Jan 24, 2025

GarinSmith commented Jan 29, 2025

GarinSmith commented Feb 4, 2025

edobrowolska commented Feb 5, 2025

Create and Manage Plan to Restore Lost EarthCODE Data #55

Create and Manage Plan to Restore Lost EarthCODE Data #55

Comments

GarinSmith commented Jan 7, 2025 • edited Loading

GarinSmith commented Jan 7, 2025

edobrowolska commented Jan 9, 2025

GarinSmith commented Jan 16, 2025

edobrowolska commented Jan 17, 2025 • edited Loading

edobrowolska commented Jan 17, 2025

edobrowolska commented Jan 17, 2025

edobrowolska commented Jan 20, 2025

GarinSmith commented Jan 21, 2025

GarinSmith commented Jan 21, 2025 • edited Loading

edobrowolska commented Jan 22, 2025

GarinSmith commented Jan 22, 2025

GarinSmith commented Jan 23, 2025

GarinSmith commented Jan 23, 2025

edobrowolska commented Jan 24, 2025

GarinSmith commented Jan 24, 2025

edobrowolska commented Jan 24, 2025

GarinSmith commented Jan 29, 2025

GarinSmith commented Feb 4, 2025

edobrowolska commented Feb 5, 2025

GarinSmith commented Jan 7, 2025 •

edited

Loading

edobrowolska commented Jan 17, 2025 •

edited

Loading

GarinSmith commented Jan 21, 2025 •

edited

Loading