Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 data cleanup #53

Closed
6 tasks done
leothomas opened this issue Aug 20, 2020 · 3 comments
Closed
6 tasks done

S3 data cleanup #53

leothomas opened this issue Aug 20, 2020 · 3 comments

Comments

@leothomas
Copy link
Contributor

leothomas commented Aug 20, 2020

PR #46 modified the locations of certain data files in order to maintain a 1-to-1 mapping between S3 folders and datasets (ie: each dataset should map to only one folder and each folder should contain only one dataset). This is to enable the dynamic domain extraction to easily query the keys of a given dataset (without having to filter out keys belonging to a different dataset in the same bucket).

These modifications include:

  • s3://covid-eo-data/detections/plane --> s3://covid-eo-data/detections-plane
  • s3://covid-eo-data/detections/ship --> s3://covid-eo-data/detections-ship
  • s3://covid-eo-data/xco2/*mean* --> s3://covid-eo-data/xco2-mean
  • s3://covid-eo-data/xco2/*diff* --> s3://covid-eo-data/xco2-diff
  • All files with full text spotlight labels ("Beijing" , "NewYork", etc) in s3://covid-eo-data/BM_500M_DAILY have been copied to files using spotlight identifiers ("be", "ny", etc) (All files with label EUPorts were omitted from this operation because the EUPorts label maps to two spotlights (du and gh))
  • s3://covid-eo-data/agriculture/CropMonitor* --> s3://covid-eo-data/agriculture-cropmonitor

TODO:

Once the /datasets endpoint has been validated and any other code modifications have been made to ensure that these location modifications do not break anything, the following steps should be taken to avoid data duplication in S3

  • delete s3://covid-eo-data/detections/plane
  • delete s3://covid-eo-data/detections/ship
  • delete s3://covid-eo-data/xco2/*mean*
  • delete s3://covid-eo-data/xco2/*diff*
  • delete these files from s3://covid-eo-data/BM_500M_DAILY/
  • delete s3://covid-eo-data/CropMonitor*
@olafveerman
Copy link
Contributor

This and more was cleaned up.

@Schpidi
Copy link

Schpidi commented Nov 11, 2020

@olafveerman following from the sidelines, is there any change needed for the trilateral dashboard?

@lubojr
Copy link

lubojr commented Nov 13, 2020

Our observations:
We have updated the CropMonitor urls, which stopped working on trilateral dashboard and now do. But NO2 data for 2020-10-01 are transparent, while NO2 diff has data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants